DataTable.Select VS DataTable.rows.Find VS的foreach vs寻找(predicate< T>)/λrows、Find、VS、DataTable

2023-09-02 20:49:02 作者:瞳孔丶洞悉所有

我有一个数据表/集合,缓存在内存中,我想用这个作为一个源产生结果自动​​完成的文本框(当然使用AJAX)。 我正在评估各种选择,以快速获取数据。 集合中的项目数/ DataTable中的行可能会有所不同,从10000至2000000。 (所以我们不得到分流,暂时假定已经作出决定,我有足够的RAM,我将使用缓存和不数据库查询本)

I have a DataTable/collection that is cached in memory, I want to use this as a source to generate results for an auto complete textbox (using AJAX of course). I am evaluating various options to fetch the data quickly. The number of items in the collection/rows in the datatable could vary from 10000 to 2,000,000. (So that we dont get diverted, for the moment assume that the decision has been made, I have ample RAM and I will be using the cache and not database query for this)

我对这个处理一些额外的业务逻辑;我有优先自动完成清单,每一个优先级列(INT)在集合中。所以,如果我一个人的科技搜索和我说20个结果的话/开始的微句子然后我会挑选具有最高优先级的前10得到的物品。 (因此需要有一个与该字符串值相关联的优先级属性)。

I have some additional business logic for this processing; I have to prioritize the auto complete list as per a priority column (int) in the collection. So if I someone searches for Micro and I get say 20 results of words/sentences that start with Micro then I would pick the top 10 resultant items with highest priority. (hence the need to have a priority property associated with the string value).

集合项目已按字母顺序排序。

The collection items are already sorted alphabetically.

会是什么在这种情况下,最好的解决办法。 1.使用DataTable.Select( 2. 使用DataTable.Rows.Find(。 3.使用自定义集合与的foreach或遍历它的值。 4.使用泛型集合与anonymous 的委托或lambda(因为两者给予同样的性能或not?)

What would be the best solution in this case. 1. Using DataTable.Select(. 2. Using DataTable.Rows.Find(. 3. use a custom collection with foreach or for to iterate through its values. 4. use a generic collection with anonymous delegates or lambda (since both give same performance or not?)

推荐答案

图表不贴在我的博客文章;更多详细信息可以在 http://msdn.microsoft.com/en-us找到/library/dd364983.aspx

The charts aren't posted on my blog entry; more details can be found at http://msdn.microsoft.com/en-us/library/dd364983.aspx

自从发现我有另一件事是,对于大型数据集,使用链式通用词典进行令人难以置信的好。这也有助于减轻许多引起的需要聚集操作,如最小和最大值(无论是与 DataTable.Compute LINQ )。

One other thing that I've since discovered is that, for large data sets, using a chained generic dictionary performs incredibly well. It also helps alleviate many of the issues caused by the sort operations required for aggregation operations such as min and max (either with DataTable.Compute or LINQ).

通过链式通用词典,我的意思是词典(串,词典(字符串,字典(整数,列表(DataRow中))))或类似的技术,其中关键的每个字典是一个搜索词。

By "chained generic dictionary," I mean a Dictionary(Of String, Dictionary(Of String, Dictionary(Of Integer, List(Of DataRow)))) or similar technique, where the key for each dictionary is a search term.

当然,这不会是在所有情况下是有用的,但我至少有一个场景,落实这一做法导致了 500X 的性能提升。

Granted, this won't be useful in all circumstances, but I have at least one scenario where implementing this approach lead to a 500x performance improvement.

在你的情况,我会考虑使用一个简单的字典,第1-5个字符,那么列表(串)。你必须建立本字典一次,第1-5个字符添加单词的列表,但是在那之后,你就可以得到极快的结果。

In your case, I'd consider using a simple dictionary with the first 1-5 characters, then a List(Of String). You'd have to build up this dictionary once, adding the words to the lists with the first 1-5 characters, but after that you'll be able to get blazingly fast results.

我一般包装在一个类,它可以让我做的事情一样方便地添加的话这样的事情。您也可以使用排序列表(串),得到的结果自动排序。这样,您就可以快速查找匹配已键入的第N个字符的单词列表。

I generally wrap things like this in a class that allows me to do things like add words easily. You may also want to use a SortedList(Of String), to get the results sorted automatically. This way, you can quickly look up the list of words that match the first N characters that have been typed.