数据结构为O(log N)查找和更新,考虑到小一级缓存数据结构、考虑到、缓存、log

2023-09-11 03:38:17 作者:Galaxy(距离)

我目前正在对在那里我遇到了性能问题,一个嵌入式设备项目。剖析已找到一个O(N)的操作,我想消除。

I'm currently working on an embedded device project where I'm running into performance problems. Profiling has located an O(N) operation that I'd like to eliminate.

我基本上有两个数组 INT A [N] 短B〔N] 。在 A 条目是独一无二的,责令外部约束。最常见的操作是检查 A 出现在 A [] 一个特定的值。不太频繁,但仍常见的是改变中的一个元素a [] 。新价值是毫无关系的previous值。

I basically have two arrays int A[N] and short B[N]. Entries in A are unique and ordered by external constraints. The most common operation is to check if a particular value a appears in A[]. Less frequently, but still common is a change to an element of A[]. The new value is unrelated to the previous value.

由于最常见的操作是查找,这就是 B [] 的原因。这是指数在 A []一个排序的数组,使得 A [B [I]]< A [B〔J]] 当且仅当 I<Ĵ。这意味着,我可以找到使用二进制搜索在 A 值。

Since the most common operation is the find, that's where B[] comes in. It's a sorted array of indices in A[], such that A[B[i]] < A[B[j]] if and only if i<j. That means that I can find values in A using a binary search.

当然,当我更新 A [K] ,我必须要找到 K B 并将其移动到一个新的位置,以维持搜索顺序。因为我知道的新旧值 A [K] ,这只是一个 memmove与()的一个子集 B [] K 新旧位置之间。这是O(N)的操作,我需要修复;因为 A的新旧值[K] 基本上是随机的,我移动平均约 N / 2 N / 3的元素。

Of course, when I update A[k], I have to find k in B and move it to a new position, to maintain the search order. Since I know the old and new values of A[k], that's just a memmove() of a subset of B[] between the old and new position of k. This is the O(N) operation that I need to fix; since the old and new values of A[k] are essentially random I'm moving on average about N/2 N/3 elements.

我看着的std :: make_heap 使用 [](INT I,诠释J){返回A [1] - ; A [J]。 } 为predicate。在这种情况下,我可以很容易地 B [0] 指向 A 的最小元素,并更新 B 现在是一个廉价的为O(log N)的重新平衡操作。不过,我一般不需要的最小值,我需要找到如果有给定的值是present。这就是现在o在 B (N日志N)搜索。 (我的N个元素有一半是在堆深度日志N,一季度(日志N)-1,等等),这是毫无起色,一个愚蠢的O(N)直接搜索的 A

I looked into std::make_heap using [](int i, int j) { return A[i] < A[j]; } as the predicate. In that case I can easily make B[0] point to the smallest element of A, and updating B is now a cheap O(log N) rebalancing operation. However, I generally don't need the smallest value of A, I need to find if any given value is present. And that's now a O(N log N) search in B. (Half of my N elements are at heap depth log N, a quarter at (log N)-1, etc), which is no improvement over a dumb O(N) search directly in A.

考虑到的std ::设为度为O(日志N)的插入和查找,我会说,它应该可以在这里得到相同的性能进行更新和发现。但如何做呢?我需要为 B 其他订单?不同的类型?

Considering that std::set has O(log N) insert and find, I'd say that it should be possible to get the same performance here for update and find. But how do I do that? Do I need another order for B? A different type?

B 目前是短期[N] ,因为 A B 一起是我的CPU缓存的大小,我的主内存是慢了很多。从6去* N为8 * N个字节就不会很好,但还是可以接受的,如果我的查找和更新去为O(log N)两种。

B is currently a short [N] because A and B together are about the size of my CPU cache, and my main memory is a lot slower. Going from 6*N to 8*N bytes would not be nice, but still acceptable if my find and update go to O(log N) both.

推荐答案

如果唯一的操作是:(1)检查是否值'一'是属于A和(2)在更新的价值观,你为什么不使用哈希表到位排序数组b?特别是如果不增大或缩小规模和值只改变,这将是一个更好的解决方案。哈希表不需要不是一个数组显著更多的内存。 (另外,乙方应不会改变一个堆,但到二叉搜索树,这可能是自平衡,例如,一个伸展树或红黑树,但树需要的额外的内存的原因的左,右指针。)

If the only operations are (1) check if value 'a' belongs to A and (2) update values in A, why don't you use a hash table in place of the sorted array B? Especially if A does not grow or shrink in size and the values only change this would be a much better solution. A hash table does not require significantly more memory than an array. (Alternatively, B should be changed not to a heap but to a binary search tree, that could be self-balancing, e.g. a splay tree or a red-black tree. However, trees require extra memory because of the left- and right-pointers.)

一个实际的解决方案,生长内存使用的6N至8n字节是瞄准了整整50%,填充哈希表,即使用一个哈希表,由2N短裤的数组。我会建议实施杜鹃散列机制(请参阅 HTTP://en.wikipedia。组织/维基/ Cuckoo_hashing )。阅读文章进一步,你会发现,你可以通过使用更多的散列函数得到客座率在50%以上(即推内存消耗8N下来,对,比如说,7N)。 使用只有三个散列函数的负载增加至91%。

A practical solution that grows memory use from 6N to 8N bytes is to aim for exactly 50% filled hash table, i.e. use a hash table that consists of an array of 2N shorts. I would recommend implementing the Cuckoo Hashing mechanism (see http://en.wikipedia.org/wiki/Cuckoo_hashing). Read the article further and you find that you can get load factors above 50% (i.e. push memory consumption down from 8N towards, say, 7N) by using more hash functions. "Using just three hash functions increases the load to 91%."

维基百科:

一个研究Zukowski撰写等。已经表明,杜鹃散列是多   比链式散列对于小,高速缓存驻留哈希表更快   现代处理器。肯尼思·罗斯已经显示出bucketized版本   杜鹃散列(即使用包含多个存储桶变体   键),比传统方法快也为大的哈希   表,当空间利用率高。的性能   bucketized杜鹃哈希表是由Askitis进一步调查,   它的表现相比较另类的散列方案。

A study by Zukowski et al. has shown that cuckoo hashing is much faster than chained hashing for small, cache-resident hash tables on modern processors. Kenneth Ross has shown bucketized versions of cuckoo hashing (variants that use buckets that contain more than one key) to be faster than conventional methods also for large hash tables, when space utilization is high. The performance of the bucketized cuckoo hash table was investigated further by Askitis, with its performance compared against alternative hashing schemes.