数据结构部分多键映射?数据结构、部分

2023-09-11 04:13:57 作者:孤傲一世的神

我的数据是由映射到值键,这样的:

I have data consists of keys mapped to values, like this:

---------------------
Key          | Value
---------------------
(0, 0, 0, 0) | a
(0, 0, 0, 1) | b
(0, 1, 0, 1) | c
(0, 1, 1, 0) | d
....

我要寻找一个数据结构可以有效地执行了键搜索查询,其中,查询可能是完全或部分规范的关键。例如:

I am looking for a data structure the can efficiently perform search queries over keys, where the queries could be complete or partial specification the key. For example:

(0, 0, 0, 1) -> a
(0, *, *, *) -> [a, b, c, d]
(0, 1, *, *) -> [c, d]

这是我所现在的想法是使用正则树,类似这样来实现这一点: 叶节点重新present的值和非树叶节点是零件的关键(即W,X,y和z节点是第一,第二,第三和密钥分别第四部分)。一个简单的BFS算法可以用来回答任何查询。但问题是,这种树与密钥的每一个新的部分呈几何级数增长。

The idea that I've right now is to implement this using a regular tree, similar to this: Leaves nodes represent the values and non-leaves nodes are parts of the key (i.e. w,x,y and z nodes are first, second, third and forth part of the key, respectively.). A simple BFS algorithm could be used to answer any query. But the problem is that this tree is growing exponentially with each new part of the key.

什么数据结构/算法更适合解决这个问题?请注意,关键部位可以是数字或字符串。

What data structure/algorithm is more appropriate to solve this problem? Note that the key parts can be numbers or strings.

推荐答案

这是数组。对真的!你不会有任何空间上的开销,没有指针追逐的开销,并计算指数只需要一点点bitmath和处理器是真的在那个相当不错的。

An array. Yes really! You will have no space overhead, no "pointer chasing" overhead, and calculating the indices only takes a little bitmath, and processors are really rather good at that.

假设你得到的部分键作为面膜其中掩盖有一个0这是通配符和1别处位,而 0的通配符和任何你想要的非通配符。

Assuming you get a partial key as a mask and bits where the mask has a 0 for the bits which are wildcards and 1 elsewhere, and the bits are 0 for the wildcards and whatever you want for the non-wildcards.

该算法收集具有相匹配的模式密钥的所有项目是:

The algorithm to collect all items that have a key that matches that pattern is:

int key = bits;
do {
    yield items[key];
    key = (key | mask) + 1 & ~mask | bits;
} while (key != bits);

这是键=(关键|屏蔽)+ 1安培; 〜面膜|位部分看起来很滑稽,这里是它的工作原理。

That key = (key | mask) + 1 & ~mask | bits part looks funny, here's how it works.

| (按位OR)使所有的非通配符1.这可确保增量一直贯彻的不是通配符位。即添加后,被认为是固定的,该位被销毁(0,如果进通过他们,否则为1),所以他们不得不屏蔽掉(在&放大器;〜面膜),然后重新设置为正确的值( |位)。运营商的precedence使得它,以便它可以在很大程度上被写入没有括号。你也可以写成

The | (bitwise OR) makes all the non-wildcards 1. That makes sure that the increment keeps carrying through the bits that are not wildcards. After that addition, the bits that were supposed to be "fixed" are destroyed (0 if a carry passed through them, 1 otherwise), so they have to be masked out (the & ~mask) and then set back to the right value (| bits). The precedence of the operators makes it so that it can largely be written without parentheses. You can also write it as

key = (((key | mask) + 1) & (~mask)) | bits;

这适用于任何类型的图案。如果你只需要最后x位是变量,可以优化了一下:

This works for any sort of pattern. If you only need "last x bits are variable", you can optimize a bit to:

int wildcards = 0;
int invmask = ~mask;
do {
    yield items[wildcards++ | bits];
} while (wildcards & invmask);

这只是从0到2 数的-通配符,然后放入固定位上方。

That just runs from 0 to 2number-of-wildcards and then puts in the fixed bits in the top.

在最简单的非二进制的情况下,关键的部分仍然比特一些积分数,也就是说,它们的范围从0到2 N -1。可以使用的恰好的相同code在此情况下,但在掩模的间pretation不同的是:代替具有单一0比特为一个通配符或单1位一个非通配符,它​​将有位(对应于宽度在一个关键部分的比特)的一些其他的数字。

In the simplest non-binary case, the parts of the key are still some integral number of bits, that is, they range from 0 to 2n-1. You can use exactly the same code in that case, but the interpretation of the mask is different: instead of having a single 0 bit for a wildcard or a single 1 bit for a non-wildcard, it would have some other number of bits (corresponding to the width in bits of a key-part).

对于非权力 - 的二,它需要一些更多的挂羊头卖狗肉。的问题是,一个进位,必须越早产生通常那样以满足约束的一个关键部分是小于某个值

For non-powers-of-two, it takes some more trickery. The problem is that a carry has to be generated sooner it normally would in order to satisfy the constrains that a key-part is less than some value.

例如,如果所有的键部件可以是0,1或2(但是不3),可以执行(未测试):

For example, if all the key-parts can be 0, 1, or 2 (but not 3), you can do (not tested):

int key = bits;
int increment = (0x55555555 & ~mask) + 1;
do {
    yield items[key];
    int temp = (key | mask) + increment & ~mask;
    int fix = (temp | (temp >> 1)) & 0x55555555;
    key = temp - fix | bits;
} while (key != bits);

的额外增量是1加上的,这是在这种情况下,最接近的2的幂的一个关键部分的最大值差值掩模1对每个密钥部分,所以有一个1在每一个时隙(槽宽2位,其为最窄它们可以在这种情况下)。它只有那些补偿的是通配符的位置。

The extra increment is 1 plus a mask of the "difference of the nearest power of two with the maximum value of a key-part", which is in this case 1 for every key-part, so there's a 1 in every "slot" (the slots are 2 bits wide, which is the narrowest they can be in this case). It only has those "offsets" at positions that are wildcards.

偏移的关键零部件,使他们的最高允许值映射到全一确保进位传播它们。然而,这意味着,它们通常保留在无效状态(除非它接收一个进位和变为零)。因此,然后是恼人的部分:偏移已被撤消的只有的关键零部件未换到零

Offsetting the key-parts so that their highest allowable value maps to "all ones" ensures that the carry propagates through them. However, it means that they are usually left in an invalid state (unless it receives a carry and goes to zero). So then comes the annoying part: the offset has to be undone only for key-parts that didn't wrap to zero.

所以修正的用武之地。它计算的关键部件,是不为零的面具。这得到更恼人,如果关键零部件更宽,并且它得到彻头彻尾的可怕,如果他们的关键零部件都不尽相同大小。

So there fix comes in. It computes a mask of key-parts that aren't zero. That gets more annoying if the key-parts are wider, and it gets downright terrible if they key-parts aren't all the same size.

那么最后一部分,键=温度 - 修复|位,撤销抵消并把非通配符回来。这样减从来没有破坏任何东西,因为只有1只从2位是一起至少1组中减去,所以进位永远不会离开一个关键的部分。

Then the last part, key = temp - fix | bits, undoes the offsetting and puts the non-wildcards back in. That subtract never destroys anything, because only 1 is subtracted only from groups of 2 bits that are together at least 1, so the carry never leaves a key-part.

这样的索引不浪费,当然一些空间,不像幂二的情况下,因为有数组中的漏洞,你永远无法索引。

This way of indexing does waste some space of course, unlike the power-of-two case, because there are "holes" in the array that you can never index into.

 
精彩推荐
图片推荐