最快的数据结构过滤模式，少集合数据结构、最快、模式

2023-09-11 06:54:03 作者：情殇

可以说我有一个集合

var data = [
  { fieldA: 5 },
  { fieldA: 142, fieldB: 'string' },
  { fieldA: 1324, fieldC: 'string' },
  { fieldB: 'string', fieldD: 111, fieldZ: 'somestring' },
  ...
];

让我们假设领域不能跨元素的校服，但我事先知道的唯一字段的号码，该集合不是动态的。

Lets assume fields are not uniform across elements but I know in advance the number of unique fields, and that the collection is not dynamic.

我想它的东西，如 _。findWhere 过滤。这是很简单的，但如果我想通过缓解优先考虑速度？有没有更好的数据结构，将永远最小化，将被检查的元素个数？也许是某种树的？

I want to filter it with something like _.findWhere. This is simple enough, but what if I want to prioritize speed over ease? Is there a better data structure that will always minimize the number of elements that will be checked? Perhaps some kind of tree?

推荐答案

是的，有一些东西更快，如果你的查询都是类型的给我带fieldX = valueY所有记录。但是，它也有一个开销。

Yes, there is something faster if your queries are of the type "give me all records with fieldX=valueY". However, it does have an overhead.

对于每一个领域，建立一个倒排索引，列出了所有的记录标识（在原数据=行位置）有每个值：

For each field, build an inverted index that lists all the record-ids ( = row positions in the original data) that have each value:

var indexForEachField = {
    fieldA: { "5": [0], "142": [1], "1324": [2]},
    ...
}

当有人问的记载，其中fieldX = valueY，返回

When someone asks for "records where fieldX=valueY", you return

indexForEachField["fieldX"]["valueY"]; // an array with all results

查找时间是恒定的，因此（仅需要2查找表中），但你必须保持你的指数是最新的。

Lookup time is therefore constant (and requires only 2 lookups in tables), but you do need to keep your index up to date.

这是使用搜索引擎查找网页某些方面的战略的推广;在这种情况，它被称为一个反向索引的

This is a generalization of the strategy used by search engines to look up webpages with certain terms; in that scenario, it is called an inverted index.

编辑：如果你想找到的所有记录。fieldX = valueX 什么 fieldY = valueY

what if you want to find all records with fieldX=valueX and fieldY=valueY?

您可以使用下面的code，它要求所有输入数组进行排序：

You would use the following code, which requires all input arrays to be sorted:

var a = indexForEachField["fieldX"]["valueX"];
var b = indexForEachField["fieldY"]["valueY"];
var c = []; // result array: all elements in a AND in b
for (var i=0, j=0; i<a.length && j<b.length; /**/) {
    if (a[i] < b[j]) {
       i++;
    } else if (a[i] > b[j]) {
       j++;
    } else {
       c.push(a[i]);
       i++; j++;
    }
}

您可以看到，在最坏的情况下，总的复杂性正是则为a.length + b.length个;并且，在最好的情况下，一半。你可以使用一些非常相似的实现或。

You can see that, in the worst case, the total complexity is exactly a.length + b.length; and, in the best case, half of that. You can use something very similar to implement OR.

上一篇：如何改善这种code？code

下一篇：如何保存在Dijkstra算法最短路径最短、算法、路径、存在

相关推荐

精彩图集

精彩推荐

图片推荐