最快的数据结构过滤模式,少集合数据结构、最快、模式

2023-09-11 06:54:03 作者:情殇

可以说我有一个集合

var data = [
  { fieldA: 5 },
  { fieldA: 142, fieldB: 'string' },
  { fieldA: 1324, fieldC: 'string' },
  { fieldB: 'string', fieldD: 111, fieldZ: 'somestring' },
  ...
];

让我们假设领域不能跨元素的校服,但我事先知道的唯一字段的号码,该集合不是动态的。

Lets assume fields are not uniform across elements but I know in advance the number of unique fields, and that the collection is not dynamic.

我想它的东西,如 _。findWhere 过滤。这是很简单的,但如果我想通过缓解优先考虑速度?有没有更好的数据结构,将永远最小化,将被检查的元素个数?也许是某种树的?

I want to filter it with something like _.findWhere. This is simple enough, but what if I want to prioritize speed over ease? Is there a better data structure that will always minimize the number of elements that will be checked? Perhaps some kind of tree?

推荐答案

是的,有一些东西更快,如果你的查询都是类型的给我带fieldX = valueY所有记录。但是,它也有一个开销。

Yes, there is something faster if your queries are of the type "give me all records with fieldX=valueY". However, it does have an overhead.

对于每一个领域,建立一个倒排索引,列出了所有的记录标识(在原数据=行位置)有每个值:

For each field, build an inverted index that lists all the record-ids ( = row positions in the original data) that have each value:

var indexForEachField = {
    fieldA: { "5": [0], "142": [1], "1324": [2]},
    ...
}

当有人问的记载,其中fieldX = valueY,返回

When someone asks for "records where fieldX=valueY", you return

indexForEachField["fieldX"]["valueY"]; // an array with all results

查找时间是恒定的,因此(仅需要2查找表中),但你必须保持你的指数是最新的。

Lookup time is therefore constant (and requires only 2 lookups in tables), but you do need to keep your index up to date.

这是使用搜索引擎查找网页某些方面的战略的推广;在这种情况,它被称为一个反向索引的

This is a generalization of the strategy used by search engines to look up webpages with certain terms; in that scenario, it is called an inverted index.

编辑:如果你想找到的所有记录。fieldX = valueX 什么 fieldY = valueY

what if you want to find all records with fieldX=valueX and fieldY=valueY?

您可以使用下面的code,它要求所有输入数组 进行排序:

You would use the following code, which requires all input arrays to be sorted:

var a = indexForEachField["fieldX"]["valueX"];
var b = indexForEachField["fieldY"]["valueY"];
var c = []; // result array: all elements in a AND in b
for (var i=0, j=0; i<a.length && j<b.length; /**/) {
    if (a[i] < b[j]) {
       i++;
    } else if (a[i] > b[j]) {
       j++;
    } else {
       c.push(a[i]);
       i++; j++;
    }
}

您可以看到,在最坏的情况下,总的复杂性正是则为a.length + b.length个;并且,在最好的情况下,一半。你可以使用一些非常相似的实现或。

You can see that, in the worst case, the total complexity is exactly a.length + b.length; and, in the best case, half of that. You can use something very similar to implement OR.

 
精彩推荐
图片推荐