内存优化排序依据,并采取?依据、内存优化

2023-09-03 07:33:28 作者:▼丢不掉的回忆、关于你

我有9个GB的数据,我想只有10行。当我做的:

I have 9 GB of data, and I want only 10 rows. When I do:

 data.OrderBy(datum => datum.Column1)
     .Take(10)
     .ToArray();

我得到一个 OutOfMemoryException异常。我想用一个 OrderByAndTake 方法,降低了内存消耗的优化。这很容易写,但我想有人已经做了。我在哪里可以找到它。

I get an OutOfMemoryException. I would like to use an OrderByAndTake method, optimized for lower memory consumption. It's easy to write, but I guess someone already did. Where can I find it.

修改:这是LINQ到对象。数据来自一个文件。每行可以丢弃,如果其值列1 比10大值当前的列表更小。

Edit: It's Linq-to-objects. The data comes from a file. Each row can be discarded if its value for Column1 is smaller than the current list of 10 biggest values.

推荐答案

我假设你正在做这在LINQ to对象。你可以这样做...

I'm assuming you're doing this in Linq to Objects. You could do something like...

var best = data
    .Aggregate(new List<T>(), (soFar, current) => soFar
                                                 .Concat(new [] { current })
                                                 .OrderBy(datum => datum.Column1)
                                                 .Take(10)
                                                 .ToList());

在这种方式,并不是所有的项目需要保持在一个新的有序集合,只有最好的10,你感兴趣的问题。

In this way, not all the items need to be kept in a new sorted collection, only the best 10 you're interested in.

这是最少code方式。既然你知道了 SOFAR 列表进行排序,测试在那里/如果插入电流可以优化。我不喜欢做所有的工作适合你。 ; - )

This was the least code way. Since you know the soFar list is sorted, testing where/if to insert current could be optimized. I didn't feel like doing ALL the work for you. ;-)

PS:替换 T 与任何你喜欢的类型为

PS: Replace T with whatever your type is.

编辑:关于它的思考,最有效的方法实际上是一个普通的旧式的foreach 每个项目都比不上的最佳10运行列表

Thinking about it, the most efficient way would actually be a plain old foreach that compares each item to the running list of best 10.