采摘五个数字该款项至S款项、数字

2023-09-11 02:54:57 作者:果酱味奶糖

由于数组 A N 非负数,我感兴趣的是找到办法的数量,你可以挑选5个号码(从阵列中的独特位置),使得它们的总和取值

Given an array A of N nonnegative numbers, I'm interested in finding the number of ways you can pick 5 numbers (from distinct positions in the array) such that their sum is S.

有一个在 O(N ^ 3)一个简单的解决方案:

There is an easy solution in O(N^3):

Let H be a hash table of (sum, position of leftmost element of sum)
for i = 0, N
    for j = i + 1, N
        H.add(A[i] + A[j], i)

numPossibilities = 0
for i = 0, N
    for j = i + 1, N
        for k = j + 1, N
            numPossibilities += H.get(S - (A[i] + A[j] + A[k]), k)

其中, H.get(X,Y​​)返回它们的和具有相同的哈希 X ,其最左边的元素是更大于k。

Where H.get(x, y) returns the number of elements in the hash whose sum has the same hash as x and whose leftmost element is bigger than k.

另外,我们可以添加3个元素哈希表的款项,然后继续2嵌套的循环。复杂性仍然但是同样的,我们只是使用更多的内存。

Alternatively, we can add sums of 3 elements to the hash table and then continue with 2 nested for loops. The complexity remains the same however, and we just use more memory.

假设输入将是相当随机的(所以没有最坏情况下的散列),是有一种算法,可以在解决这个O(N ^ 2)或者 O(N ^ 2日志N),甚至 O(N ^ 3),如果其持有的所有情况?我想二进制搜索可能会有所帮助,但我不明白如何处理重叠的索引。

Assuming the inputs will be fairly random (so no worst-case hashing), is there an algorithm that can solve this in O(N^2) or maybe O(N^2 log N), or even O(N^3) if it holds in all cases? I'm thinking binary searching might help, but I don't see how to deal with overlapping indexes.

以上的解决方案是好了很多,在实践中比天真5 for循环的解决方案,但是,我有我们可以做很多更好的感觉,因此这个问题。

The above solution is a lot better in practice than the naive 5-for-loops solution, however I have a feeling we can do a lot better, hence this question.

如果你能证明不存在这样的算法,怎样才能上述方案进行优化?

If you can prove that no such algorithm exists, how can the above solution be optimized?

澄清:

上面的算法确实 O(N ^ 5)在最坏的情况下,当给定的数组包含什么,但1号,我们有,如 S = 5 。平均然而, H.get 方法很多接近 O(1),因此我的平均立方的复杂性

The above algorithm is indeed O(N^5) in the worst case, such as when the given array contains nothing but the number 1 and we have S = 5. On average however, the H.get method is a lot closer to O(1), hence my average cubic complexity.

如果您实现这一点,并在1000随机数运行在一个大的区间(比如0高达Int32.MaxValue),你会看到它的运行比较快的。不过,这是不难找到的投入,它需要很长的时间。即使我们不能让它运行速度不够快,人人享有平等的数字,我们可以做什么优化?

If you implement this and run it on 1000 random numbers in a big interval (say 0 upto Int32.MaxValue), you will see that it runs relatively fast. Still, it's not hard to find inputs for which it takes a long time. Even if we can't get it running fast enough for all equal numbers, what optimizations could we make?

在同样的假设,我们可以做的更好,渐进或者至少在实践中?

Under the same assumptions, can we do better, asymptotically or at least in practice?

推荐答案

我想的是,这些数字必须有鲜明的立场是红鲱鱼。您可以使用 排容原理计数所有的位置(I,J,K,L,M),其中x [I] + X [j]的+ X [K]的数量+ X [升] + X [米] = S和I,J,K,L,M是不同的:

I think the fact that the numbers must have distinct positions is a red herring. You can use the inclusion-exclusion principle to count the number of all positions (i,j,k,l,m) where x[i]+x[j]+x[k]+x[l]+x[m]=S and i,j,k,l,m are distinct:

 sums with i!=j,i!=k,i!=l...,l!=m  = all sums 
                                   - sums with i=j
                                   - ...
                                   - sums with l=m
                                   + sums with i=j and j=k
                                   + ...
                                   + sums with k=l and l=m
                                   - ...
                                   + sums with i=j=k=l=m

计算右边的款项,除了第一个,是可行的O(N ^ 2日志N)。例如,为了找到位置(I,I,K,L,M)使得x [I] + X [I] + X [k]的+ X [升] + X [米] = S的数量可以创建有序阵列款项{2A + B}和{C + D},并检查他们是否有元素的x,y,使得X + Y = S。

Computing the sums on the right, except the first one, is doable in O(N^2 log N). For example, to find the number of positions (i,i,k,l,m) such that x[i]+x[i]+x[k]+x[l]+x[m]=S you can create sorted arrays with sums {2a+b} and {c+d} and check if they have elements x,y such that x+y=S.

主要算法

所以,它足以计算究竟有多少位置(I,J,K,L,M),其中 X [I] + X [J] + X [K] + X [L] + X [米] = S 和I,J,K,L,米不一定不同。基本上,你可以用白痴的解决方案是这样的:

So it's enough to compute how many are there positions (i,j,k,l,m) where x[i]+x[j]+x[k]+x[l]+x[m]=S and i,j,k,l,m are not necessarily different. Basically, you can use Moron's solution this way:

创建和的排序数组{A + B:a,b是从数组数};组相等的元素融入其中,记忆计数。例如,对于数组[1,1,3]你得到9款项[2,2,2,2,4,4,4,4,6]形式A + B。然后你组相同的元素回忆计数:[(2,4),(4,4),(6,1)]。这一步是O(N ^ 2日志N)。

Create a sorted array of sums {a+b: a,b are numbers from array}; group equal elements into one, remembering count. For example, for array [1,1,3] you get nine sums [2,2,2,2,4,4,4,4,6] of the form a+b. Then you group same elements remembering counts: [(2,4),(4,4),(6,1)]. This step is O(N^2 log N).

有关每一封,算多少都存在对数组的总和硒元素。正如伦的解决方案,你有两个指针,一是在朝好的方向发展,一是走左边。如果该和过低,移动第一指针增加的总和;如果总和过高,使第二指针减小它

For each e, count how many are there pairs of elements in the array that sum to S-e. As in Moron's solution, you have two pointers, one going right, one going left. If the sum is too low, move the first pointer increasing the sum; if the sum is too high, move the second pointer decreasing it.

假设的总和是正确的。这是指一个点,(一,x)和第二对(B,Y),其中A + B =硒。增加X * Y计数器和移动这两个指针(你可以只移动一个指针,但对下一步就没有比赛,而第二个指针将被移动即可。)。

Suppose the sum is correct. This means one points to (a,x) and second to (b,y) where a+b=S-e. Increase the counter by x*y and move both pointers (You could move only one pointer, but on the next step there would be no match, and the second pointer would be moved then.).

例如,对于[(2,4),(4,4),(6,1)]数组和Se = 8,在第一指针点(2,4),第二个为(6,1 )。由于2 + 6 = 8,添加4和移动这两个指针。现在,它们都在点(4,4),所以您可以通过16增加计数器不要停!该指针相互传递,你会得到第一个在(6,1),第二个为(2,4),由4个增加计数器。

For example, for [(2,4),(4,4),(6,1)] array and S-e=8, the first pointer points at (2,4) and the second at (6,1). Since 2+6=8, you add 4 and move both pointers. Now they both point at (4,4), so you increase the counter by 16. Don't stop! The pointers pass each other, and you get first at (6,1), second at (2,4), increase the counter by 4.

那么,在年底,有4 + 16 + 4 = 24的方式来得到8为4个元素之和[1,1,3]:

So, in the end, there are 4+16+4=24 ways to get 8 as a sum of 4 elements of [1,1,3]:

>>> len([k for k in itertools.product([1,1,3],repeat=4) if sum(k) == 8])
24

Prelude Control.Monad> length [k | k <- replicateM 4 [1,1,3], sum k == 8]
24

重复,对于每个电子邮件,你会得到的方式获得S作的5个元素的总和计数。

Repeating that for each e, you'll get count of ways to get S as a sum of 5 elements.

有关[1,1,1,1,1]和Se = 4,该款项阵列将是[(2,25),你会得到,有625方式获得的4总和

For [1,1,1,1,1] and S-e=4, the sums array would be [(2,25)], and you'd get that there are 625 ways to get sum of 4.

有关每一封,这一步是线性的数组的大小(所以它的O(N 2 )),所以循环需要O(N 3 )。

For each e, this step is linear in size of the array (so it's O(N2)), so the loop takes O(N3).

在容斥

呼叫五元组(I,J,K,L,M)适当如果x [I] + X [J] + X [K] + X [L] + X [M] = S。我们的目标是正确计数五元组(I,J,K,L,m),其中I,J,K,L,m为成对不同的数目。主要的算法可以在O(N ^ 3)算多少是有适当的五元组具有不一定是不同的组件。剩下的事情就是计算这些错误的元组。

Call a quintuple (i,j,k,l,m) "proper" if x[i]+x[j]+x[k]+x[l]+x[m]=S. The goal is to count the number of proper quintuples (i,j,k,l,m) where i,j,k,l,m are pairwise distinct. The main algorithm can count in O(N^3) how many are there proper quintuples which have not necessarily distinct components. The remaining thing is to count those "wrong" tuples.

考虑适当的五元组的子集

Consider the subsets of proper quintuples

A XY = {(I,J,K,L,M):在第x和y号举行指标是一样的}

Axy={(i,j,k,l,m): indices on x-th and y-th place are the same}

例如,A 24 是一组合适的五元组(I,J,K,L,M),其中J = 1。

For example, A24 is the set of proper quintuples (i,j,k,l,m) where j=l.

该套错了五元组是:

A 12 ∪A 13 ∪...∪A 45

A12 ∪ A13 ∪ ... ∪ A45

由容斥其计数基数:

| A 12 ∪A 13 ∪...∪A 45 | = | A 12 | + | A 13 | + ... + | A 45 | - | A 12 ∩A 23 | - ...... - | A 34 ∩A 45 | + ... + | A 12 ∩A 23 ∩...∩A 35 ∩A 45 |

|A12 ∪ A13 ∪ ... ∪ A45| = |A12| + |A13| + ... + |A45| - |A12 ∩ A23| - ... - |A34 ∩ A45| + ... + |A12 ∩ A23 ∩ ... ∩ A35 ∩ A45|

有2 10 = 1024在这里被加数。但很多的基数的是相同的。

There are 210=1024 summands here. But a lot of the cardinalities is the same.

你要算的唯一的事情是:

The only things you have to count is:

X 1 = | A 12 | - 五元组,其中i = j的 X 2 = | A 12 ∩A 23 | - 五元组,其中i = j的= K X 3 = | A 12 ∩A 23 ∩A 34 | - 五元组,其中i = J = K =→ X 4 = | A 12 ∩A 23 ∩A 34 ∩A 45 | - 五元组,其中i = j的= K = L = M X 5 = | A 12 ∩A 34 | - 五元组,其中i = J,K = L X 6 = | A 12 ∩A 23 ∩A 45 | - 五元组,其中i = J = K,L = M X1 = |A12| - quintuples with i=j X2 = |A12 ∩ A23| - quintuples with i=j=k X3 = |A12 ∩ A23 ∩ A34| - quintuples with i=j=k=l X4 = |A12 ∩ A23 ∩ A34 ∩ A45| - quintuples with i=j=k=l=m X5 = |A12 ∩ A34| - quintuples with i=j,k=l X6 = |A12 ∩ A23 ∩ A45| - quintuples with i=j=k,l=m

您可以看到,通过置换,所有其他的集合在这里psented重新$ P $。例如,A 24 具有相同的基数为A 12 。

You can observe, by permuting, all other sets are represented here. For example, A24 has the same cardinality as A12.

的这6套计数的基数是相当容易的。对于第一个,你创建数组{2A + B}和{C + D}并计算有多少人有共同的要素;对于其他的人,只有3个或更少的自由变量,因此,即使一个简单的循环会给你O(N ^ 3)。

Counting cardinalities of those 6 sets is rather easy. For the first one, you create arrays {2a+b} and {c+d} and count how many are there common elements; for the other ones, there are only 3 or less free variables, so even a simple loop will give you O(N^3).

要简化之,我写了下面的Haskell程序:

To simplify the sum, I wrote the following Haskell program:

import Control.Monad
import Data.List
import qualified Data.Map as Map

-- Take equivalence relation, like [(1,2),(2,3)] and return its partition, like [3,1,1]
f xs = sort $ map length $ foldr f (map return [1..5]) xs
       where f (x,y) a = let [v1] = filter (x `elem`) a
                             [v2] = filter (y `elem`) a
                         in if v1 == v2 then a else (a \\ [v1,v2]) ++ [v1++v2]

-- All 1024 subsets of [(1,2),(1,3), ..., (4,5)]
subsets = filterM (const [False, True]) [(i,j) | i <- [1..5], j <- [i+1..5]]

res = Map.fromListWith (+) $ map (\k -> (f k, (-1)^(length k))) subsets

*Main> res
Loading package array-0.3.0.1 ... linking ... done.
Loading package containers-0.3.0.0 ... linking ... done.
fromList [([1,1,1,1,1],1),([1,1,1,2],-10),([1,1,3],20),([1,2,2],15),([1,4],-30),([2,3],-20),([5],24)]

这意味着该公式是

which means that the formula is

所有子集 - 10X 1 + 20X 2 - 30X 3 + 24X 4 + 15X 5 - 20X 6

all subsets - 10X1 + 20X2 - 30X3 + 24X4 + 15X5 - 20X6.

检查:

有多少人在那里的五元组[0,0,0,...,0]总结为0?计算的一个方法是直接,第二种方式是使用式(和不关心独特位置):

How many are there quintuples in [0,0,0,...,0] summing up to 0? One way to compute that is directly, second way is to use the formula (and not care about distinct positions):

direct x = x*(x-1)*(x-2)*(x-3)*(x-4)
indirect x = x^5 - 10 * x^4 + 20 * x^3 + 15 * x^3 - 30 * x^2 - 20*x^2 + 24*x

*Main> direct 100
9034502400
*Main> indirect 100
9034502400

其他说明:

此外,还有O(一个 N 登录一个 N )解决方案:计算(X 在 1 + ... + X 在 N ) 5 使用FFT,结果是系数在x 取值。这使得一些在我被使用了两次,但你可以减去像(X 2A 1 + ... + X 多项式图2a N ) 5 *(X 在 1 + ... + X 一个 N ) 3 等,根据容斥原理。

Also, there's O(an log an) solution: Compute (xa1 + ... + xan)5 using FFT, the result is coefficient at xS. This allows some ai to be used twice, but you can subtract polynomials like (x2a1 + ... + x2an)5*(xa1 + ... + xan)3 etc. according to inclusion-exclusion principle.

在计算一些限制车型,它已经显示决定版本需要O (N ^ 3)的时间。

In some restricted models of computation, it has been shown decision version of this problem requires O(N^3) time.