为什么SortedSet的< T> .GetViewBetween不是为O(log N)?不是、LT、SortedSet、GT

2023-09-02 11:53:45 作者:阑意

在.NET 4.0以上版本,一类的SortedSet< T> 有一个名为方法 GetViewBetween(L,R),它返回含有指定两者之间的所有值的树部分的接口视图。鉴于的SortedSet< T> 被实现为红黑树,我当然希望它在运行为O(log N)的时间。在C ++中类似的方法是的std ::设置:: LOWER_BOUND / UPPER_BOUND ,在Java中它的 TreeSet.headSet / tailSet ,它们是对数的。

然而,事实并非如此。下面code在32秒,而相当于 O(日志N)版本 GetViewBetween 足以令在1-2秒这个code运行。

变种S =新的SortedSet< INT>(); INT N = 100000; VAR兰特=新的随机(1000000007); INT总和= 0; 的for(int i = 0;我n种; ++ I){     s.Add(rand.Next());     如果(rand.Next()%2 == 0){         int类型l = rand.Next(int.MaxValue / 2 - 10);         INT R = L + rand.Next(int.MaxValue / 2 - 10);         变种吨= s.GetViewBetween(L,R);         总和+ = t.Min;     } } Console.WriteLine(总和);

我反编译System.dll中使用 dotPeek 和这里就是我得到:

公共TreeSubSet(SortedSet的< T>底层,T最小,T最大,布尔lowerBoundActive,布尔upperBoundActive)     :基地(Underlying.Comparer) {     this.underlying =标的;     this.min =最小;     this.max =最大;     this.lBoundActive = lowerBoundActive;     this.uBoundActive = upperBoundActive;     this.root = this.underlying.FindRange(this.min,this.max,this.lBoundActive,this.uBoundActive);     this.count = 0;     this.version = -1;     this.VersionCheckImpl(); } 内部的SortedSet< T> .Node FindRange(T从,T于布尔lowerBoundActive,布尔upperBoundActive) {   SortedSet的< T> .Node节点= this.root;   而(节点!= NULL)   {     如果(lowerBoundActive&安培;&安培; this.comparer.Compare(来自,node.Item)大于0)     {       节点= node.Right;     }     其他     {       如果(upperBoundActive || this.comparer.Compare(于node.Item)>!= 0)         返回节点;       节点= node.Left;     }   }   返回(SortedSet的< T> .Node)空; } 私人无效VersionCheckImpl() {     如果(this.version == this.underlying.version)       返回;     this.root = this.underlying.FindRange(this.min,this.max,this.lBoundActive,this.uBoundActive);     this.version = this.underlying.version;     this.count = 0;     base.InOrderTreeWalk((树遍历predicate< T>)(N =>     {       SortedSet的< T> .TreeSubSet temp_31 =这一点;       INT temp_34 = temp_31.count + 1;       temp_31.count = temp_34;       返回true;     })); } 利用Redis Sorted Set实现排行榜功能

因此​​, FindRange 很明显 O(日志N),但在那之后,我们称之为 VersionCheckImpl ...这确实找到子树的线性时间穿越只为述说着它的节点!

为什么你需要做的是遍历所有的时间? 为什么.NET不包含 O(日志N)方法分割基于密钥的树,像C ++或Java?它的真正的在很多情况下有帮助的。 解决方案

有关版本字段

UPDATE1:

在我的记忆中,有很多(可能吗?)藏品BCL拥有该领域版本

首先,关于的foreach

根据该 MSDN链接

  

foreach语句重复一组数组中的每个元素或对象集合嵌入语句。 foreach语句用于通过收集迭代以获得所需的信息,但不应该被用来改变集合的内容,以避免未predictable副作用

在许多其他的收藏品,版本被保护的数据时不会被修改的foreach

例如,的HashTable 的MoveNext()

 公共虚拟BOOL的MoveNext()
{
    如果(this.version!= this.hashtable.version)
    {
        抛出新的InvalidOperationException异常(Environment.GetResourceString(InvalidOperation_EnumFailedVersion));
    }
    ..........
}
 

但在在的SortedSet< T> 的MoveNext()方法:

 公共BOOL的MoveNext()
{
    this.tree.VersionCheck();
    如果(this.version!= this.tree.version)
    {
        ThrowHelper.ThrowInvalidOperationException(ExceptionResource.InvalidOperation_EnumFailedVersion);
    }
    ....
}
 

UPDATE2:

但是,O(N)循环可能不仅对版本也为计数属性。

由于GetViewBetween 的 MSDN说:

  

该方法返回lowerValue和upperValue下降之间的元素范围的观点,通过比较器的定义.... 您可以更改这两个视图和底层的SortedSet(Of T)已在

因此​​,对于每个更新,应同步计数字段(key和value都已经相同)。为了确保计数正确

有两个策略来达到目标​​:

微软 在Mono的

First.MS的,在他们的code,他们牺牲了 GetViewBetween()的表现,赢得了计数属性的表现。

VersionCheckImpl()是同步的单程计数属性。

二,单声道。在单声道的code, GetViewBetween()速度较快,但在他们的 getCount将()方法:

 内部覆盖INT getCount将()
{
    诠释计数= 0;
    使用(VAR E = set.tree.GetSuffixEnumerator(下)){
        而(e.MoveNext()&安培;&安培; set.helper.Compare(上,e.Current)GT; = 0)
            ++计数;
    }
    返回计数;
}
 

这始终是一个O(N)操作!

In .NET 4.0+, a class SortedSet<T> has a method called GetViewBetween(l, r), which returns an interface view on a tree part containing all the values between the two specified. Given that SortedSet<T> is implemented as a red-black tree, I naturally expect it to run in O(log N) time. The similar method in C++ is std::set::lower_bound/upper_bound, in Java it's TreeSet.headSet/tailSet, and they are logarithmic.

However, that is not true. The following code runs in 32 sec, whereas the equivalent O(log N) version of GetViewBetween would make this code run in 1-2 sec.

var s = new SortedSet<int>();
int n = 100000;
var rand = new Random(1000000007);
int sum = 0;
for (int i = 0; i < n; ++i) {
    s.Add(rand.Next());
    if (rand.Next() % 2 == 0) {
        int l = rand.Next(int.MaxValue / 2 - 10);
        int r = l + rand.Next(int.MaxValue / 2 - 10);
        var t = s.GetViewBetween(l, r);
        sum += t.Min;
    }
}
Console.WriteLine(sum);

I decompiled System.dll using dotPeek and here's what I got:

public TreeSubSet(SortedSet<T> Underlying, T Min, T Max, bool lowerBoundActive, bool upperBoundActive)
    : base(Underlying.Comparer)
{
    this.underlying = Underlying;
    this.min = Min;
    this.max = Max;
    this.lBoundActive = lowerBoundActive;
    this.uBoundActive = upperBoundActive;
    this.root = this.underlying.FindRange(this.min, this.max, this.lBoundActive, this.uBoundActive);
    this.count = 0;
    this.version = -1;
    this.VersionCheckImpl();
}

internal SortedSet<T>.Node FindRange(T from, T to, bool lowerBoundActive, bool upperBoundActive)
{
  SortedSet<T>.Node node = this.root;
  while (node != null)
  {
    if (lowerBoundActive && this.comparer.Compare(from, node.Item) > 0)
    {
      node = node.Right;
    }
    else
    {
      if (!upperBoundActive || this.comparer.Compare(to, node.Item) >= 0)
        return node;
      node = node.Left;
    }
  }
  return (SortedSet<T>.Node) null;
}

private void VersionCheckImpl()
{
    if (this.version == this.underlying.version)
      return;
    this.root = this.underlying.FindRange(this.min, this.max, this.lBoundActive, this.uBoundActive);
    this.version = this.underlying.version;
    this.count = 0;
    base.InOrderTreeWalk((TreeWalkPredicate<T>) (n =>
    {
      SortedSet<T>.TreeSubSet temp_31 = this;
      int temp_34 = temp_31.count + 1;
      temp_31.count = temp_34;
      return true;
    }));
}

So, FindRange is obviously O(log N), but after that we call VersionCheckImpl... which does a linear-time traversal of the found subtree only for recounting its nodes!

Why would you need to do that traversal all the time? Why .NET does not contain a O(log N) method for splitting a tree based on key, like C++ or Java? It is really helpful in lots of situations.

解决方案

about the version field

UPDATE1:

In my memory, a lot of(maybe all?) collections in BCL have the field version.

First,about foreach:

according to this msdn link

The foreach statement repeats a group of embedded statements for each element in an array or an object collection. The foreach statement is used to iterate through the collection to get the desired information, but should not be used to change the contents of the collection to avoid unpredictable side effects.

In many other collections, version is protected the data is not modified during the foreach

For example, HashTable's MoveNext():

public virtual bool MoveNext()
{
    if (this.version != this.hashtable.version)
    {
        throw new InvalidOperationException(Environment.GetResourceString("InvalidOperation_EnumFailedVersion"));
    }
    ..........
}

But in the in the SortedSet<T>'s MoveNext() method:

public bool MoveNext()
{
    this.tree.VersionCheck();
    if (this.version != this.tree.version)
    {
        ThrowHelper.ThrowInvalidOperationException(ExceptionResource.InvalidOperation_EnumFailedVersion);
    }       
    ....
}

UPDATE2:

But the O(N) loop maybe not only for version but also for the Count property.

Because the MSDN of GetViewBetween said:

This method returns a view of the range of elements that fall between lowerValue and upperValue, as defined by the comparer .... You can make changes in both the view and in the underlying SortedSet(Of T).

So for every update it should be sync the count field (key and value are already same). To make sure the Count is correct

There were two policies to reach the target:

Microsoft's Mono's

First.MS's,in their code, they sacrifice the GetViewBetween()'s performance and win the Count Property's performance.

VersionCheckImpl() is one way to sync the Count property.

Second,Mono. In mono's code,GetViewBetween() is Faster, but in their GetCount()method:

internal override int GetCount ()
{
    int count = 0;
    using (var e = set.tree.GetSuffixEnumerator (lower)) {
        while (e.MoveNext () && set.helper.Compare (upper, e.Current) >= 0)
            ++count;
    }
    return count;
}

It is always an O(N) operation!