如何找到配对的第k个最大的一笔?最大

2023-09-10 23:02:49 作者:执着不该执着的执着

由于两个数字排序的数组,我们要找到对的第k最大可能的总和。 (一对是来自第一阵列的一个元素和从所述第二阵列的一个元素)。例如,数组

Given two sorted arrays of numbers, we want to find the pair with the kth largest possible sum. (A pair is one element from the first array and one element from the second array). For example, with arrays

[2,3,5,8,13] [4,8,12,16]

,金额最大的是对

13 + 16 = 29 13 + 12 = 25 在8 + 16 = 24 13 + 8 = 21 在8 + 12 = 20

因此​​,对与第四大之和为(13,8)。如何找到配对的第k最大可能的总和?

So the pair with the 4th largest sum is (13, 8). How to find the pair with the kth largest possible sum?

此外,什么是最快的算法?该阵列已经排序和大小M和N。

Also, what is the fastest algorithm? The arrays are already sorted and sizes M and N.

我已经知道在 O(Klogk)的解决方案,使用最大堆给出的这里。

I am already aware of the O(Klogk) solution , using Max-Heap given here .

这也是最喜欢之一的谷歌的采访问题,他们需要一个 O(k)的解决方案

It also is one of the favorite Google interview question , and they demand a O(k) solution .

我也读的地方,存在一个 O(K)的解决方案,我无法弄清楚。

I've also read somewhere that there exists a O(k) solution, which i am unable to figure out .

有人能解释一个伪code正确的解决方案。

Can someone explain the correct solution with a pseudocode .

P.S。 请不要发布this链接为接听/ comment.It不包含的答案。

P.S. Please DON'T post this link as answer/comment.It DOESN'T contain the answer.

推荐答案

我先从简单的,但不太线性时间的算法。我们选择 ARRAY1 [0] + ARRAY2 [0] ARRAY1 [N-1] + ARRAY2 [N-1] 。然后,我们确定很多对款项如何大于这个值,其中有多少是少。指针递增第一阵列时总和太大和指针第二阵列时总和太小递减:这可以通过迭代阵列两个指针来完成。重复此过程对于不同的值,并使用二进制搜索(或片面二进制搜索),我们可以发现第K为O最大总和(N日志R)的时间,其中N是最大阵列的大小和R是间 ARRAY1 [N-1] + ARRAY2 [N-1] ARRAY1 [0] + ARRAY2 [0] 。该算法具有线性时间复杂度只有当数组元素是由小恒界整数。

I start with a simple but not quite linear-time algorithm. We choose some value between array1[0]+array2[0] and array1[N-1]+array2[N-1]. Then we determine how many pair sums are greater than this value and how many of them are less. This may be done by iterating the arrays with two pointers: pointer to the first array incremented when sum is too large and pointer to the second array decremented when sum is too small. Repeating this procedure for different values and using binary search (or one-sided binary search) we could find Kth largest sum in O(N log R) time, where N is size of the largest array and R is number of possible values between array1[N-1]+array2[N-1] and array1[0]+array2[0]. This algorithm has linear time complexity only when the array elements are integers bounded by small constant.

previous算法可以得到改善(N 2 )为O(N)。然后我们填充辅助阵列这些对款项(这可能略有修改两个指针算法来完成)。然后我们用quickselect算法来寻找K个最大的一笔在这个辅助阵列。所有这一切并不能改善最坏情况的复杂性,因为我们仍然需要为O(log R)二进制搜索的步骤。如果我们保持这种算法的quickselect一部分,但(以获得适当的值范围),我们使用的东西比二进制搜索更好?

Previous algorithm may be improved if we stop binary search as soon as number of pair sums in binary search range decreases from O(N2) to O(N). Then we fill auxiliary array with these pair sums (this may be done with slightly modified two-pointers algorithm). And then we use quickselect algorithm to find Kth largest sum in this auxiliary array. All this does not improve worst-case complexity because we still need O(log R) binary search steps. What if we keep the quickselect part of this algorithm but (to get proper value range) we use something better than binary search?

我们可以估算值范围与下面的技巧:让每一个第二个元素的每个阵列,并试图找到对之秩 K / 4 这些半阵列(使用相同的算法递归地)。显然,这应该给予一定的近似所需的数值范围。而事实上略有改善这一招的变种使范围仅包含O(N)的元素。这证明在下面的文章:选择在X + Y和矩阵与排序的行和列由A. Mirzaian和E Arjomandi 。本文件包含的算法的算法除了 Quickselect所有零件。如果线性最坏情况的复杂性是必需的,Quickselect可增强与中位数算法中值。

We could estimate value range with the following trick: get every second element from each array and try to find the pair sum with rank k/4 for these half-arrays (using the same algorithm recursively). Obviously this should give some approximation for needed value range. And in fact slightly improved variant of this trick gives range containing only O(N) elements. This is proven in following paper: "Selection in X + Y and matrices with sorted rows and columns" by A. Mirzaian and E. Arjomandi. This paper contains detailed explanation of the algorithm, proof, complexity analysis, and pseudo-code for all parts of the algorithm except Quickselect. If linear worst-case complexity is required, Quickselect may be augmented with Median of medians algorithm.

该算法的复杂度为O(N)。如果阵列中的一个比其它阵列短(M其中N),我们可以假设这短阵列延伸到大小为N的一些非常小的元件,使得在最大的阵列的算法使用大小的所有计算。我们实际上并不需要提取对这些补充的元素和饲料他们quickselect,这使得算法快一点点,但不会提高渐进的复杂性。

This algorithm has complexity O(N). If one of the arrays is shorter than other array (M < N) we could assume that this shorter array is extended to size N with some very small elements so that all calculations in the algorithm use size of the largest array. We don't actually need to extract pairs with these "added" elements and feed them to quickselect, which makes algorithm a little bit faster but does not improve asymptotic complexity.

如果K&LT; n我们可以忽略所有的数组元素与索引大于k的。在这种情况下,复杂度等于O(k)的。如果N&LT; K&LT; N(N-1),我们只是有更好的复杂性比要求的OP。如果K> N(N-1),我们会更好地解决相反的问题:第k最小和

If k < N we could ignore all the array elements with index greater than k. In this case complexity is equal to O(k). If N < k < N(N-1) we just have better complexity than requested in OP. If k > N(N-1), we'd better solve the opposite problem: k'th smallest sum.

我上传了简单的C ++ 11实施 ideone 。 code不是最优化的,而不是彻底的测试。我试图使它尽可能地接近,以伪code的链接文件。此实现使用的std :: nth_element ,这使得线性复杂度只平均(不是最坏情况)。

I uploaded simple C++11 implementation to ideone. Code is not optimized and not thoroughly tested. I tried to make it as close as possible to pseudo-code in linked paper. This implementation uses std::nth_element, which allows linear complexity only on average (not worst-case).

一个完全不同的方法找到线性时间第K总和是基于优先级队列(PQ)。一种变体是插入最大一双PQ,然后反复去掉PQ的顶部,而是插入多达两个双(一个递减索引在一个阵列中,其他与其它阵列递减索引)。并采取了一些措施,以prevent插入重复的对。其他的变化是将包含第一个数组的最大元素所有可能的对,然后重复删除PQ的顶部,而不是对插入在第一阵列相同的指数第二个数组减指数。在这种情况下没有必要理会重复。

A completely different approach to find K'th sum in linear time is based on priority queue (PQ). One variation is to insert largest pair to PQ, then repeatedly remove top of PQ and instead insert up to two pairs (one with decremented index in one array, other with decremented index in other array). And take some measures to prevent inserting duplicate pairs. Other variation is to insert all possible pairs containing largest element of first array, then repeatedly remove top of PQ and instead insert pair with decremented index in first array and same index in second array. In this case there is no need to bother about duplicates.

OP提到Ø(K记录K),其中PQ是作为最大堆解决方案。但在某些情况下,(当数组元素均匀地分布在有限范围和线性复杂整数只需要平均不最坏情况),我们可以用O(1)时间优先级队列,例如,如在本文中所描述:< A HREF =htt​​p://arxiv.org/pdf/physics/0606226>一个复杂度为O(1)优先级队列为事件驱动的分子动力学模拟由杰拉尔德保罗。这使得O(K)预计时间复杂度。

OP mentions O(K log K) solution where PQ is implemented as max-heap. But in some cases (when array elements are evenly distributed integers with limited range and linear complexity is needed only on average, not worst-case) we could use O(1) time priority queue, for example, as described in this paper: "A Complexity O(1) Priority Queue for Event Driven Molecular Dynamics Simulations" by Gerald Paul. This allows O(K) expected time complexity.

这种方法的优点是有可能提供的排序顺序第k个元素。缺点是有限的选择数组元素类型的,更复杂和更慢的算法,更糟糕的渐进复杂度:O(K)> O(N)

Advantage of this approach is a possibility to provide first K elements in sorted order. Disadvantages are limited choice of array element type, more complex and slower algorithm, worse asymptotic complexity: O(K) > O(N).