发现如果两个数组包含相同整数集不超过NlogN额外的空间和更快不超过、更快、整数、数组

2023-09-10 23:13:02 作者：半城烟沙自寂寥

我碰到这个post,该报告随后的采访问题：

I came across this post, which reports the following interview question:

由于两个数字阵列，发现如果每两个数组中有同一组整数？建议它可以运行比NlogN更快的算法中没有多余的空间？

Given two arrays of numbers, find if each of the two arrays have the same set of integers ? Suggest an algo which can run faster than NlogN without extra space?

这是我能想到的最好的是以下内容：

The best that I can think of is the following:

（一）排序每个数组，然后（二）有两个指针移动沿两个数组和检查，如果你发现不同的价值......但步骤（a）早已NlogN的复杂性：（

(a) sort each array, and then (b) have two pointers moving along the two arrays and check if you find different values ... but step (a) has already NlogN complexity :(

（一）扫描最短阵列，并把价值成图，然后（二）第二次扫描阵列和检查，如果你找到一个值，是不是在地图上...在这里，我们有线性复杂度，但我们我使用额外的空间

(a) scan shortest array and put values into a map, and then (b) scan second array and check if you find a value that is not in the map ... here we have linear complexity, but we I use extra space

...所以，我不认为这个问题解决了。

... so, I can't think of a solution for this question.

想法？

感谢你所有的答案。我觉得很多人是对的，但我决定选择ruslik's 之一，因为它提供了一个有趣的选择，我也没多想。

Thank you for all the answers. I feel many of them are right, but I decided to choose ruslik's one, because it gives an interesting option that I did not think about.

推荐答案

您可以尝试通过选择交换功能的积累（例如，添加或XOR）和参数化哈希函数概率方法。

You can try a probabilistic approach by choosing a commutative function for accumulation (eg, addition or XOR) and a parametrized hash function.

unsigned addition(unsigned a, unsigned b);
unsigned hash(int n, int h_type);

unsigned hash_set(int* a, int num, int h_type){
    unsigned rez = 0;
    for (int i = 0; i < num; i++)
        rez = addition(rez, hash(a[i], h_type));
    return rez;
};

在这种方式的尝试你决定的假阳性的概率会低于某一treshold将不依赖于元件的数量，所以这将是线性的。前数

In this way the number of tries before you decide that the probability of false positive will be below a certain treshold will not depend on the number of elements, so it will be linear.

修改：在一般情况下，套是相同的概率是非常小的，所以这个O（n）的检查与几个散列函数可用于prefiltering：决定一样快尽可能如果它们确实不同，或者如果存在的它们等价的概率，和是否应使用较慢的确定性方法。最终的平均复杂度将是为O（n），但最坏的情况下将具有的determenistic方法的复杂度。

EDIT: In general case the probability of sets being the same is very small, so this O(n) check with several hash functions can be used for prefiltering: to decide as fast as possible if they are surely different or if there is a probability of them being equivalent, and if a slow deterministic method should be used. The final average complexity will be O(n), but worst case scenario will have the complexity of the determenistic method.

上一篇：如何确定是否堆的第k个最大元素是比x大元素、最大

下一篇：Dijkstra算法 - 在C ++？算法、Dijkstra

相关推荐

精彩图集

精彩推荐

图片推荐