排序列表与SortedDictionary与排序()列表、SortedDictionary

2023-09-02 23:50:22 作者:勿歆

这是一个延续的像this 之一。

This is a continuation of questions like this one.

是否有调整性能的任何指引?我的意思不是收益大O,只是​​节省了一定的线性时间。

Are there any guidelines for tweaking the performance? I don't mean gains in big-O, just saving some linear time.

例如,多少pre-分类保存在任排序列表 SortedDictionary

For example, how much does pre-sorting save on either SortedList or SortedDictionary?

说我有一个人级与3个属性进行排序,其中一人是年龄在年。我应该桶对象的年龄第一次?

Say I have a person-class with 3 properties to sort on, one of them is age in years. Should I bucket the objects on age first?

我应该第一个排序的属性,然后使用结果列表/字典上两个属性等进行排序?

Should I first sort on one property, then use the resulting list/dictionary to sort on two properties and so on?

任何其他的优化​​,春天介意吗?

Any other optimizations that spring to mind?

推荐答案

嗯,这是一个轻松取胜的排序列表。插入一个项目需要一个二进制搜索(O(日志(N)),找到插入点,然后List.Insert(为O(n))来插入项目的插入()占主导地位,填充列表需要为O(n ^ 2),如果输入项目已经排序,然后插入折叠到O(1),但不影响搜索。组装现在是O(n日志(n))的,你不用担心哦有多大,排序首先是始终更高效。假设你能承受的了一倍的存储需求。

Well, it's an easy win on SortedList. Inserting an item requires a binary search (O(log(n)) to find the insertion point, then a List.Insert (O(n)) to insert the item. The Insert() dominates, populating the list requires O(n^2). If the input items are already sorted then the Insert collapses to O(1) but doesn't affect the search. Populating is now O(nlog(n)). You don't worry how big the Oh is, sorting first is always more efficient. Assuming you can afford the doubled storage requirement.

SortedDictionary不同的是,它采用了红黑树。寻找插入点需要O(日志(N))。重新平衡树可能需要以后,还需要O(的log(n))。填充字典因此需要O(n日志(N))。使用排序的投入不会改变的努力,找到插入点或再平衡,它仍然是O(n日志(N))。现在虽然哦事项,插入排序的输入需要树不断的重新平衡自己。它的工作原理更好,如果输入的是随机的,你不想排序的输入。

SortedDictionary is different, it uses a red-black tree. Finding the insertion point requires O(log(n)). Rebalancing the tree might be required afterwards, that also takes O(log(n)). Populating the dictionary thus takes O(nlog(n)). Using sorted input does not change the effort to find the insertion point or rebalancing, it is still O(nlog(n)). Now the Oh matters though, inserting sorted input requires the tree to constant rebalance itself. It works better if the input is random, you don't want sorted input.

所以填充排序列表与排序的输入和填充SortedDictionary与未分类的投入既是O(n日志(N))。忽略提供有序的成本投入,排序列表的哦,比SortedDictionary的哦较小。这是一个实现细节,由于列表分配内存的方式。它只有这样做,为O(log(n))的时候,一个红黑树必须分配O(N)次。非常小的哦,顺便说一句。

So populating SortedList with sorted input and populating SortedDictionary with unsorted input is both O(nlog(n)). Ignoring the cost of providing sorted input, the Oh of SortedList is smaller than the Oh of SortedDictionary. That's an implementation detail due to the way List allocates memory. It only has to do so O(log(n)) times, a red-black tree has to allocate O(n) times. Very small Oh btw.

值得注意的是,没有一个逊色了简单的填充一个列表,然后调用排序()。这也是O(n日志(N))。事实上,如果输入已经意外排序,你可以绕过排序()调用,这个塌陷到O(N)。成本分析,现在需要移动到需要得到输入排序的努力。这是很难绕过的排序(),O(n日志(n))的根本复杂性。它可能不容易看见,你可能会得到输入排序,也就是说,一个SQL查询。它只是需要更长的时间才能完成。

Notable is that neither one compares favorably over simply populating a List, then calling Sort(). That's also O(nlog(n)). In fact, if input is already accidentally sorted you can bypass the Sort() call, this collapses to O(n). The cost analysis now needs to move to the effort it takes to get the input sorted. It is hard to bypass the fundamental complexity of Sort(), O(nlog(n)). It might not be readily visible, you might get the input sorted by, say, a SQL query. It will just take longer to complete.

使用两种排序列表或SortedDictonary的一点是要保持集合排序插入之后。如果你只担心填充,但不是变异,那么你不应该使用这些集合。

The point of using either SortedList or SortedDictonary is to keep the collection sorted after inserts. If you only worry about populating but not mutating then you shouldn't use those collections.