燕窝产量返回的IEnumerable< IEnumerable的< T>>与懒惰的评价懒惰、产量、评价、LT

2023-09-04 02:06:04 作者:伴我久久不北北

我写了一个LINQ的扩展方法 SplitBetween 类似于 String.Split

I wrote a LINQ extension method SplitBetween analogous to String.Split.

> new List<int>(){3,4,2,21,3,2,17,16,1}
> .SplitBetween(x=>x>=10)

[3,4,2], [3,2], [], [1]

来源:

// partition sequence into sequence of contiguous subsequences
// behaves like String.Split
public static IEnumerable<IEnumerable<T>> SplitBetween<T>(this IEnumerable<T> source, 
                                                          Func<T, bool> separatorSelector, 
                                                          bool includeSeparator = false)
{
    var l = new List<T>();
    foreach (var x in source)
    {
        if (separatorSelector(x))
        {
            if (includeSeparator)
            {
                l.Add(x);
            }
            yield return l;
            l = new List<T>();
        }
        else
        {
            l.Add(x);
        }
    }
    yield return l;
}

在LINQ的精神,我觉得这个方法应该做懒的评价。然而,我的执行确实在外部的IEnumerable懒惰的评价,但不通过内部的IEnumerable 的。我该如何解决这个问题?

In the spirit of LINQ I think this method ought to do lazy evaluation. However, my implementation does lazy evaluation over the outer IEnumerable, but not over the inner IEnumerable. How can I fix this?

如何外部行为是懒惰的证明。假设 ThrowingEnumerable&LT; INT&GT; 的IEnumerable&LT; INT&GT; ,当有人试图爆炸迭代(见飞碟双向的Edulinq )。

A demonstration of how the outer behaviour is lazy. Assume ThrowingEnumerable<int> is an IEnumerable<int> that explodes when anyone tries to iterate over it (see Skeet's Edulinq).

(new List<int>(){1,2,3,10,1})
.Concat(Extensions.ThrowingEnumerable<int>())
.SplitBetween(x=>x>=10)
.First().ToList();

[1,2,3]

但内在行为的不是懒惰的

(new List<int>(){1,2,3,10,1})
.Concat(Extensions.ThrowingEnumerable<int>())
.SplitBetween(x=>x>=10)
.ElementAt(2).First();

BOOM

我们预计1在这里。

推荐答案

编辑:有什么不对你的方法,不同的是投掷枚举真的会咚当你列举了。那是什么它意味着。它没有一个合适的的GetEnumerator 上定义。所以,你的code没有显示出真正的问题。在做第一种情况下首先,你只列举到的第一个结果集(只 {1,2,3} )中获得,而不是列举投掷枚举(即 Concat的没有被执行)。但在第二个例子中,你所要求的元素在 2 拆分后,这意味着它会枚举投掷枚举过,将走潮。这里的关键是要了解的ElementAt 枚举集,直到指标要求,而不是天生懒惰(它不能定)。

There is nothing wrong with your approach, except that a throwing enumerable will really "boom" when you enumerate it. Thats what's its meant for. It doesn't have a proper GetEnumerator defined on it. So your code exhibits no real problem. In the first case by doing First, you're only enumerating till the first result set (just { 1, 2, 3 }) is obtained and not enumerating the throwing enumerable (which means Concat is not being executed). But in the second example, you're asking for element at 2 after the split, which means it will enumerate the throwing enumerable too and will go "boom". The key here is to understand ElementAt enumerates the collection till the index asked to and is not inherently lazy (it cant be).

我不知道如果完全懒是要走的路在这里。问题是,懒洋洋地分割成外部和内部序列的全过程运行在一个枚举它可以产生不同的结果取决于枚举状态。比如你只枚举外序列,内部序列将不再是你所期望的。或者,如果您枚举只有一半的外序列和一个内部序列,这将是其他内部序列的状态?你的方法是最好的。

I'm not sure if fully lazy is the way to go here. The problem is that the whole process of splitting lazily into outer and inner sequences runs on one enumerator which can yield different results depending on enumerator state. For instance you enumerate only the outer sequence, the inner sequences no longer will be what you expect. Or if you enumerate only half the outer sequence and one inner sequence, what will be the state of other inner sequences? Your approach is the best.

下面的方法是懒惰(仍然兴旺因为这是保证),因为它不使用中间的具体实现,,但可以比你原来的做法慢,因为它遍历列表中不止一次:

The below approach is lazy (still will boom since that's warranted) in that it uses no intermediate concrete implementations, but can be slower than your original approach because it traverses the list more than once:

public static IEnumerable<IEnumerable<T>> SplitBy<T>(this IEnumerable<T> source, 
                                                     Func<T, bool> separatorPredicate, 
                                                     bool includeEmptyEntries = false,
                                                     bool includeSeparators = false)
{
    int prevIndex = 0;
    int lastIndex = 0;
    var query = source.Select((t, index) => { lastIndex = index; return new { t, index }; })
                      .Where(a => separatorPredicate(a.t));
    foreach (var item in query)
    {
        if (item.index == prevIndex && !includeEmptyEntries)
        {
            prevIndex++;
            continue;
        }

        yield return source.Skip(prevIndex)
                           .Take(item.index - prevIndex + (!includeSeparators ? 0 : 1));
        prevIndex = item.index + 1;
    }

    if (prevIndex <= lastIndex)
        yield return source.Skip(prevIndex);
}

通过你所有的原始方法是最好的。如果你需要的东西完全懒,那么下面我回答符合你要知道它只是为了喜欢的事情:

Over all your original approach is the best. If you need something fully lazy, then my below answer fits. Mind you its only meant for things like:

foreach (var inners in outer)
    foreach (var item in inners)
    { 
    }

而不是

var outer = sequence.Split;
var inner1 = outer.First;
var inner2 = outer.ElementAt; //etc

换句话说,不适合在同一顺序内多次重复。如果您知道这个危险的构造

这不使用中间具体的集合,没有了ToList 源枚举,并完全懒/迭代器上下的:

This uses no intermediate concrete collections, no ToList on source enumerable, and is fully lazy/iterator-ish:

public static IEnumerable<IEnumerable<T>> SplitBy<T>(this IEnumerable<T> source,
                                                     Func<T, bool> separatorPredicate,
                                                     bool includeEmptyEntries = false,
                                                     bool includeSeparator = false)
{
    using (var x = source.GetEnumerator())
        while (x.MoveNext())
            if (!separatorPredicate(x.Current))
                yield return x.YieldTill(separatorPredicate, includeSeparator);
            else if (includeEmptyEntries)
            {
                if (includeSeparator)
                    yield return Enumerable.Repeat(x.Current, 1);
                else
                    yield return Enumerable.Empty<T>();
            }
}

static IEnumerable<T> YieldTill<T>(this IEnumerator<T> x, 
                                   Func<T, bool> separatorPredicate,
                                   bool includeSeparator)
{
    yield return x.Current;

    while (x.MoveNext())
        if (!separatorPredicate(x.Current))
            yield return x.Current;
        else
        {
            if (includeSeparator)
                yield return x.Current;
            yield break;
        }
}

简短,甜蜜和简单。如果你想返回空集(默认情况下它忽略),我增加了一个额外的标志来表示。如果没有这个标志,在code是更加简洁。

Short, sweet and simple. I have added an additional flag to denote if you want to return empty sets (by default it ignores). Without that flag, the code is even more concise.

谢谢这个问题,这将是有我的扩展方法库! :)

Thanks for this question, this will be there in my extension methods library! :)

 
精彩推荐
图片推荐