高效的图的遍历与LINQ - 消除递归递归、遍历、高效、LINQ

2023-09-03 08:48:25 作者:羽殇*

今天,我要实现的方法来遍历任意深度图和压平成一个枚举。相反,我首先做了一点搜索,发现这样的:

Today I was going to implement a method to traverse an arbitrarily deep graph and flatten it into a single enumerable. Instead, I did a little searching first and found this:

public static IEnumerable<T> Traverse<T>(this IEnumerable<T> enumerable, Func<T, IEnumerable<T>> recursivePropertySelector)
{
    foreach (T item in enumerable)
    {
        yield return item;

        IEnumerable<T> seqRecurse = recursivePropertySelector(item);

        if (seqRecurse == null) continue;
        foreach (T itemRecurse in Traverse(seqRecurse, recursivePropertySelector))
        {
            yield return itemRecurse;
        }
    }
}

在理论上,这看起来不错,但在实践中我发现它显著执行更差比使用等效的手写code(视情况出现),要经过一个曲线图,做任何需要做的事情。我怀疑这是因为在这种方法中,每次返回项目,时进栈的放松一些任意深度的层次。

In theory this looks good, but in practice I've found it performs significantly more poorly than using equivalent hand-written code (as the situation arises) to go through a graph and do whatever needs to be done. I suspect this is because in this method, for every item it returns, the stack has to unwind to some arbitrarily deep level.

我还怀疑这种方法会更有效地运行很多,如果递归被淘汰。我也正好不是很好消除递归。

I also suspect that this method would run a lot more efficiently if the recursion were eliminated. I also happen to not be very good at eliminating recursion.

有谁知道如何重写这个方法来消除递归?

Does anyone know how to rewrite this method to eliminate the recursion?

感谢您的帮助。

编辑: 非常感谢所有详细的答复。我试过基准原来的解决方案VS埃里克的解决方案,相较于未使用枚举法,而是递归遍历与AA的Lambda和奇怪的是,在lambda递归是显著快于其他两种方法。

Thanks very much for all the detailed responses. I've tried benchmarking the original solution vs Eric's solution vs not using an enumerator method and instead recursively traversing with a a lambda and oddly, the lambda recursion is significantly faster than either of the other two methods.

class Node
{
    public List<Node> ChildNodes { get; set; } 

    public Node()
    {
        ChildNodes = new List<Node>();
    }
}

class Foo
{
    public static void Main(String[] args) 
    {
        var nodes = new List<Node>();
        for(int i = 0; i < 100; i++)
        {
            var nodeA = new Node();
            nodes.Add(nodeA);
            for (int j = 0; j < 100; j++)
            {
                var nodeB = new Node();
                nodeA.ChildNodes.Add(nodeB);
                for (int k = 0; k < 100; k++)
                {
                    var nodeC = new Node();
                    nodeB.ChildNodes.Add(nodeC);
                    for(int l = 0; l < 12; l++)
                    {
                        var nodeD = new Node();
                        nodeC.ChildNodes.Add(nodeD);
                    }
                }
            }
        }            

        nodes.TraverseOld(node => node.ChildNodes).ToList();
        nodes.TraverseNew(node => node.ChildNodes).ToList();

        var watch = Stopwatch.StartNew();
        nodes.TraverseOld(node => node.ChildNodes).ToList();
        watch.Stop();
        var recursiveTraversalTime = watch.ElapsedMilliseconds;
        watch.Restart();
        nodes.TraverseNew(node => node.ChildNodes).ToList();
        watch.Stop();
        var noRecursionTraversalTime = watch.ElapsedMilliseconds;

        Action<Node> visitNode = null;
        visitNode = node =>
        {
            foreach (var child in node.ChildNodes)
                visitNode(child);
        };

        watch.Restart();
        foreach(var node in nodes)
            visitNode(node);
        watch.Stop();
        var lambdaRecursionTime = watch.ElapsedMilliseconds;
    }
}

在哪里TraverseOld是原来的方法,TraverseNew是Eric的方法,显然拉姆达是拉姆达。

Where TraverseOld is the original method, TraverseNew is Eric's method, and obviously the lambda is the lambda.

在我的机器上,TraverseOld需要10127毫秒,TraverseNew需要3038毫秒,拉姆达递归需要1181毫秒。

On my machine, TraverseOld takes 10127 ms, TraverseNew takes 3038 ms, the lambda recursion takes 1181 ms.

这是典型的枚举方式(与产量返程)可能需要3倍,只要不是立即执行?抑或是别的东西怎么回事?

Is this typical that enumerator methods (with yield return) can take 3X as long as opposed to immediate execution? Or is something else going on here?

推荐答案

首先,你是完全正确的;如果图有n个的平均深度d节点那么天真的嵌套迭代器产生一个解决方案,为O(n * D)的时间和O(四)协议栈。如果d为n的大部分那么这可以成为一个为O(n 2 )算法,如果d为大,那么你完全可以吹堆栈。

First off, you are absolutely correct; if the graph has n nodes of average depth d then the naive nested iterators yield a solution which is O(n*d) in time, and O(d) in stack. If d is a large fraction of n then this can become an O(n2) algorithm, and if d is large then you can blow the stack entirely.

如果您有兴趣的嵌套迭代器的性能分析,请参阅以前的C#编译器开发人员韦斯·戴尔的博客文章:

If you are interested in a performance analysis of nested iterators, see former C# compiler developer Wes Dyer's blog post:

http://blogs.msdn.com/b/wesdyer/archive/2007/03/23/all-about-iterators.aspx

dasblinkenlight的解决方案是在标准方法的变化。我通常会写程序是这样的:

dasblinkenlight's solution is a variation on the standard approach. I would typically write the program like this:

public static IEnumerable<T> Traverse<T>(
    T root, 
    Func<T, IEnumerable<T>> children)
{
    var stack = new Stack<T>();
    stack.Push(root);
    while(stack.Count != 0)
    {
        T item = stack.Pop();
        yield return item;
        foreach(var child in children(item))
            stack.Push(child);
    }
}

,然后如果你有多个根:

And then if you have multiple roots:

public static IEnumerable<T> Traverse<T>(
    IEnumerable<T> roots, 
    Func<T, IEnumerable<T>> children)
{
    return from root in roots 
           from item in Traverse(root, children)
           select item ;
}

现在,请注意,一个遍历的没有的你想,如果你有一个高度互联图或循环图什么!如果你有一个向下的箭头指向图:

Now, note that a traversal is not what you want if you have a highly interconnected graph or a cyclic graph! If you have a graph with downward pointing arrows:

          A
         / \
        B-->C
         \ /
          D

则遍历是A,B,D,C,D,C,D。如果你有一个环状或相互连接的图形,那么你想要什么的传递闭的。

public static IEnumerable<T> Closure<T>(
    T root, 
    Func<T, IEnumerable<T>> children)
{
    var seen = new HashSet<T>();
    var stack = new Stack<T>();
    stack.Push(root);

    while(stack.Count != 0)
    {
        T item = stack.Pop();
        if (seen.Contains(item))
            continue;
        seen.Add(item);
        yield return item;
        foreach(var child in children(item))
            stack.Push(child);
    }
}

这变化只产生尚未产生之前的项目。

This variation only yields items that have not been yielded before.

我也正好不是很好消除递归。

I also happen to not be very good at eliminating recursion.

我已经写了一些文章就如何消除递归,有关递归程序一般。如果这个题目感兴趣,请参见:

I have written a number of articles on ways to eliminate recursion, and about recursive programming in general. If this subject interests you, see:

http://blogs.msdn.com/b/ericlippert/archive/tags/recursion/

在特定的:

http://blogs.msdn.com/b/ericlippert/archive/2005/08/01/recursion-part-two-unrolling-a-recursive-function-with-an-explicit-stack.aspx

http://blogs.msdn.com/b/ericlippert/archive/2005/08/04/recursion-part-three-building-a-dispatch-engine.aspx

http://blogs.msdn.com/b/ericlippert/archive/2005/08/08/recursion-part-four-continuation-passing-style.aspx