Python的链表O(1)插入/删除链表、Python

2023-09-10 23:56:16 作者:别把往事吵醒

我要寻找一个链表和相关算法实现的Python。每个人都问我只是建议使用内置的Python列表,但性能测量表明,清单插入和删除是我们的应用程序的瓶颈。这是微不足道的实施一个简单的链表,但我不知道是否有一个成熟的库,它包含了像排序,合并,拼接,搜索一些操作,下/上限,等等...

我知道这是一个欺骗,但搜索在任何搜索引擎的Python列表给出了predictably效果差,大多数人只是说,链表是不需要在Python(pfft!)。

PS:我需要插入,并从任何地方在列表中删除,而不仅仅是末端

OK,你问它: 予需要保持几十万条目的有序列表。我会遍历列表前锋(一个接一个),使用上的每个条目的访问者,从一开始或二进制搜索找到一个位置开始。当找到一个条目相匹配的predicate它被从列表中删除,然后,另一二进制搜索对从移除条目的previous位置开始列表的子集进行,直至一个位置统计确定预先。忽略错误条件时,修改后的项可以被用于创建另一个被拼接成通过第二二进制搜索中发现的新的位置链表。迭代从被删除的条目,其中位置继续。有时,几千个连续有序的条目可被添加到/从列表中的任何地方移走。有时,几千元不连续的条目必须进行搜索并删除增量。

python的列表是unnacceptable作为插入/移除的成本过高,和次要增益在速度为二分搜索是完全无关的总成本。我们的内部的测试证实了这一点的。

如果我忽略了任何一个细节也许我可以发邮件给你的我公司的非公开协议副本,我可以和你对此事私下对应。 sarcasm.end()的。

解决方案

下面是一个blog帖子分享你的痛苦。它包括一个链表的实现和性能比较。

也许 blist 会更好,但(从这里)?

  

使用情况下BList是   比Python的列表中稍微慢一点   如下(O(log n)的与O(1)):

        在大名单是永远不会改变的长度。   在一个大的名单,其中插入和删除是只在的结束   列表(LIFO)。         Python 链表入门 1

使用该声明的出路,   这里有一些使用情况下的   该BLists是大大快于   内置的清单:

        插入到或运出大名单(O(log n)的主场迎战为O(n))   在服用大名单的大型片(O(log n)的VS为O(n))   在大名单的制作浅拷贝(O(1)对为O(n​​))   更改大名单的大型片(O(日志N +登录K)与O(N + K))   乘以一个列表,使一个大的,稀疏的列表(为O(log k)的对比   O(KN))   

请注意,它实际上实现为B +树,允许所有的这些操作强大的性能。

I am looking for a linked list and related algorithms implementation for Python. Everyone I ask just recommends using built in Python lists, but performance measurements indicate that list insertion and removal is a bottleneck for our application. It's trivial to implement a simple linked list, but I wonder if there is a mature library which includes some operations like sort, merge, splice, search, lower/upper bound, etc...

I know this is a dupe, but searching for python list on any search engine gives predictably poor results, with most people just saying that linked lists are unneeded in python (pfft!).

PS: I need to insert and remove from anywhere in the list, not just the ends.

OK, you asked for it: I need to maintain an ordered list of several hundred thousand entries. I will iterate over the list forwards (one by one), using a visitor on each entry, starting from the beginning or a position found by a binary search. When an entry matching a predicate is found it is removed from the list, and then, another binary search is performed on a subset of the list beginning from the removed entry's previous position, until a position determined statistically beforehand. Ignoring the error condition, the modified entry may be used to create another linked list which is spliced into the new position found through the second binary search. Iteration is continued from the position where the entry was removed. On occasion, several thousand contiguous ordered entries may be added to/removed from any place in the list. Sometimes several thousand non-contiguous entries must be searched for and removed incrementally.

python's list is unnacceptable as the cost of insertion/removal is prohibitive, and the minor gains in speed for the binary search are totally irrelevant to the total cost. Our in house tests confirm this.

if I have neglected any detail perhaps I can e-mail you a copy of my company's non-disclosure agreement and I can privately correspond with you on the matter. sarcasm.end().

解决方案

Here's a blog post sharing your pain. It includes an implementation of a linked list and a performance comparison.

Perhaps blist would be better, though (from here)?

The use cases where the BList is slightly slower than Python's list are as follows (O(log n) vs. O(1)):

A large list that never changes length. A large lists where inserts and deletes are only at the end of the list (LIFO).

With that disclaimer out of the way, here are some of the use cases where the BLists is dramatically faster than the built-in list:

Insertion into or removal from a large list (O(log n) vs. O(n)) Taking large slices of large lists (O(log n) vs O(n)) Making shallow copies of large lists (O(1) vs. O(n)) Changing large slices of large lists (O(log n + log k) vs. O(n + k)) Multiplying a list to make a large, sparse list (O(log k) vs. O(kn))

Note that it's actually implemented as a B+ tree, allowing great performance for all those operations.