查找重复在一个阵列中的O(N)时间阵列、时间

2023-09-10 23:32:58 作者:红颜未老心先死

有没有办法找到o在N个元素的数组中的所有重复的元素(N)的时间?

Is there a way to find all the duplicate elements in an array of N elements in O(N) time?

例如:

输入: 11,29,81,14,43,43,81,29

输出: 29,81,43

排序输入,做线性扫描,以检测是否有重复破坏了秩序,并给出输出:29,43,81

Sorting the input and doing a linear scan to detect duplicates destroys the order and gives the output: 29,43,81.

排序逐个关键指标的另一个数组 {0,1,...,N-1} 根据给定的阵列来获得 {-1,4,2} ,然后排序结果集指数来获得 {1,2,4} 将会给我们 {29,81,43} ,但这需要 O(N logN)的的时间。

Sorting-by-key another array of indices {0,1,...N-1} according to the given array to get {1,4,2} and then sorting the resultant set of indices to get {1,2,4} will give us {29,81,43}, but this takes O(N logN) time.

是否有一个O(N)算法来解决这个问题呢?

Is there an O(N) algorithm to solve this problem?

P.S。我忘了补充:我不想使用哈希表。我要寻找一个非散列的解决方案。

P.S. I forgot to add: I dont want to use hash tables. I am looking for a non-hash solution.

推荐答案

我相信一个好的解决方案(体面的内存使用量,可用于立即确定是否一个条目已经看到这样preserving秩序,并用线性复杂度)是一个线索。

I believe a good solution (decent memory usage, can be used to immediately determine if an entry has already been seen thus preserving order, and with a linear complexity) is a trie.

如果你插入的元素融入线索,好像他们是一个字符串,每个数字在每个节点(从MSD起),你可以用O(复杂度拉这一关的 M 的ñ )其中的 M 的是数字在以10位的平均长度。

If you insert the elements into the trie as if they were a string with each digit (starting from the MSD) in each node, you can pull this off with a complexity of O(m N) where m is the average length of numbers in base-10 digits.

您刚刚遍历所有条目,将其插入到该线索。每当一个元素已经存在,你跳过它并移动到下一个。在此重复(不像我的previous的基数排序的答案)的将会的立即在最后一次迭代中,而不是或什么不是。

You'd just loop over all your entries and insert them into the trie. Each time an element already exists, you skip it and move on to the next. Duplicates in this (unlike in my previous answer of a Radix Sort) will be found immediately instead of in the last iteration or what not.

我不知道,如果你将受益于使用后缀树在这里,因为基地的字符被输入到特里只有10(相对于基128 ANSI字符串),但它可能

I'm not sure if you would benefit from using a suffix tree here, as the "base" of the characters being entered into the trie is only 10 (compared to the base-128 for ANSI strings), but it's possible.