什么算法用来段数序列为n的子集，以尽量减少在各子集中的数字的总和的标准偏差子集、偏差、总和、序列

2023-09-11 22:48:32 作者：旧楹联红褪墨残谁来揭

我正在寻找一种算法来段的序列的正数成n个子序列，使得该数字的总和的在每个子集的标准偏差最小化。

I'm looking for an algorithm to segment a sequence of positive numbers into n subsequences, such that the standard deviation of the sum of the numbers in each subset is minimized.

的数字在每个子序列的顺序需要是相同的原始序列的排序

The ordering of the numbers in each subsequence needs to be the same as the ordering in the original sequence

例如：

假设我有一个序列{1,1,1,1,1,1,10,1}，我想段为2个序列。我认为最佳的解决方案将是{1,1,1,1,1,1}，{10,1}。

Suppose I have a sequence {1,1,1,1,1,1,10,1} that i wanted to segment into 2 subsequences. I believe the optimal solution would be {1,1,1,1,1,1}, {10,1} .

的第一子序列的总和是6，第二子序列的总和为11 这两个数字的标准偏差为3.5〜，我相信这是最低的。

The sum of the 1st subsequence is 6, the sum of the 2nd subsequence is 11 The standard deviation of the two numbers is ~3.5, which i believe is the lowest possible.

假设我有一个序列{4,1,1,1,1,6}，我想段分为3个序列。我相信最佳的解决办法是{4}，{1,1,1,1}，{6} 子序列的总和为4,4，和6 3个数字的标准偏差为1.15〜，我相信这是最低的。

Suppose I have a sequence {4,1,1,1,1,6} that i wanted to segment into 3 subsequences. I believe the optimal solution would be {4}, {1,1,1,1}, {6} The sum of the subsequences is 4, 4, and 6. The standard deviation of the 3 numbers is ~1.15, which i believe is the lowest possible.

的最佳算法我能够想出是找到各序列中的号码的累计总和，和在段的每个区间的序列〔totalSum / numSubsequences]

The best algorithm i was able to come up with was to find the cumulative sum of each of the numbers in the sequence, and segment the sequence at each interval of [totalSum/numSubsequences].

例如，给出的序列{4,1,1,1,1,6}，每个序列的号的累积和为{4,5,6,7,8,14}。总的所有数字的序列中是14，因此，考虑到我想3个子序列，我应该段的序列当总达到14/3 = 4.66和2 * 14/3 = 9.333333。

For example, given the sequence {4,1,1,1,1,6} , the cumulative sums of the numbers of each sequence is {4,5,6,7,8,14}. The total of all numbers in the sequence is 14, so, given that i want 3 subsequences, i should segment the sequence when the total reaches 14/3 = 4.66 and 2 * 14/3 = 9.333333.

不过，有序列，其中累计总数等于4.66中没有实际的地方 - 第一累计总值为4，而接下来的累计值为5。这样，我圆了或者我应该向下取整？在这种情况下，舍入至4给出了最佳的解决方案，但是这并非总是如此。最好我能想到的就是尽量四舍五入向上和向下的每个组合，但结果在O（2 ^ numSubsequences）的复杂性。

However, there is no actual place in the sequence where the cumulative total is equal to 4.66 - the first cumulative total is 4, and next cumulative total is 5. So should i round up or should i round down? In this case, rounding down to 4 gives the optimal solution, but that isn't always the case. The best I can think of is to try every combination of rounding up and down, but that results in O(2^numSubsequences) complexity.

这似乎是，将有一个preexisting算法适用的东西的类型，但我的谷歌搜索失败我。我知道了划分问题，这是一个NP完全的，但涉及无序集，而不是有序序列。

This seems to be the type of thing that would have a preexisting algorithm to apply, however my Googling has failed me. I am aware of the Partition Problem, which is NP-complete, but that deals with unordered sets, and not ordered sequences.

任何帮助将是AP preciated。

Any help would be appreciated.

推荐答案

假设原始序列的长度→和子序列的数量是 N 。

Suppose the length of the original sequence is L and the number of subsequences is N.

您可以简化EX pression标准差获得的sqrt（E [X ^ 2] - E [X] ^ 2），其中电子表示期望/平均 X 表示您的随机变量 - 在你的情况下，子序列的总和。（类似的公式适用于样本标准差。）注意， E [X] 不取决于你如何分割你的程序，因为它永远是总总和除以 N 。因此，我们只是希望尽量减少 E [X ^ 2] 或等价时，的总和X ^ 2 （他们相差 N 的一个因素被平均的定义）。

You may simplify the expression for standard deviation to get sqrt(E[X^2] - E[X]^2), where E denotes expectation/average and X denotes your random variable -- in your case, the sum of the subsequences. (A similar formula applies for the "sample standard deviation".) Note that E[X] does not depend on how you split your sequence, because it will always be the total sum divided by N. Thus, we just want to minimize E[X^2] or equivalently, the sum of X^2 (they differ by a factor of N by the definition of average).

目前这一点上，我们可以看到，这个问题可以用动态规划来解决。让 F（I，J）为我从 0 到 M 和Ĵ从 1 到 N ，是子序列之和的平方最小的总和，从第一个我您的序列元素融入Ĵ序列。然后，我们看到 F（I，J）可以在所有的 F（I'，J'），与我'＆LT; = I 和 J＆LT; J'。更具体地讲，如果你的序列是 A [K] 从 0 索引到 M-1 ：


At this point, we can see that this problem can be solved with dynamic programming.  Let f(i,j), for i from 0 to M and j from 1 to N, be the minimal sum of squares of sums of subsequences from the split of the first i elements of your sequence into j subsequences.  Then we see that f(i,j) may be computed in terms of all the f(i',j') with i' <= i and j < j'.  More specifically, if your sequence is a[k] indexed from 0 to M-1:
f(i,1) = sum( a[k] for 0 <= k < i )^2
f(i,j) = minimum of  f(l,j-1)+sum( a[k] for l < k < i )^2  for l from 0 to i



已经最小化 F（N，L），你可以使用标准的动态规划技术来恢复分裂。特别是，您可以存储→，最大限度地减少 F（I，J）。
Having minimized f(N,L), you can use standard dynamic programming techniques to recover the splits.  In particular, you can store the l that minimizes f(i,j).
该解决方案的运行时间 O（L ^ 2 N），因为你计算 O（LN）不同对 F 值与最小超过 0（1）的 L个不同的值。
The runtime of this solution is O(L^2 N) because you compute O(L N) different values of f and the minimum is over O(L) different values of l.
下面是Perl中的一个简单的实现：
Here's a straightforward implementation in Perl:
#!/usr/bin/perl

use strict;
use warnings;

local $\ = $/;
print join ", ", map {"@$_"} best( 2, qw(1 1 1 1 1 1 10 1) );
# prints "1 1 1 1 1 1, 10 1"

print join ", ", map {"@$_"} best( 3, qw(4 1 1 1 1 6) );
# prints "4, 1 1 1 1, 6"

sub best {
    my( $N, @a ) = @_;

    my( @f, @g, $i, $j, $k, $sum );

    # DP base case
    $sum = 0;
    $f[0][1] = $g[0][1] = 0;
    for $i ( 1 .. @a ) {
        $sum += $a[$i-1];
        $f[$i][1] = $sum * $sum;
        $g[$i][1] = 0;
    }

    # DP recurrence
    for $j ( 2 .. $N ) {
        $f[0][$j] = $g[0][$j] = 0;
        for $i ( 1 .. @a ) {
            $sum = 0;
            $f[$i][$j] = $f[$i][$j-1];
            $g[$i][$j] = $i;
            for $k ( reverse 0 .. $i-1 ) {
                $sum += $a[$k];
                if( $f[$i][$j] > $f[$k][$j-1] + $sum * $sum ) {
                    $f[$i][$j] = $f[$k][$j-1] + $sum * $sum;
                    $g[$i][$j] = $k;
                }
            }
        }
    }

    # Extract best expansion
    my( @result );
    $i = @a; $j = $N;

    while( $j ) {
        $k = $g[$i][$j];
        unshift @result, [@a[$k .. $i-1]];
        $i = $k;
        $j--;
    }

    return @result;
}



                
                
                                    上一篇：找到的一组给定的所有子集的最小公倍数的总和最小公倍数、子集、总和
                                                            下一篇：给定一个数n，找出有多少数字有数字2的范围0，...，N数字、有多少、个数、范围
                                    

                
                
                    
                        相关推荐
                       
                    
                  

                    
好的算法查找（稀疏）图的直径？稀疏、直径、算法
算法找到从搜索文档最小的片段？算法、片段、最小、文档
什么应该学生教首当第一次学习排序算法？算法、学生
简单的建议/推荐算法算法、简单、建议
code世代遗传算法世代、算法、code
有这种算法得到正确实施？算法、正确
如何更新内容的普及得分如黑客新闻的算法？算法、得分、
算法的最小变化量算法、最小
两个音频序列之间的感知相似性相似性、序列、音频、两
有什么不对的RGB到XYZ色彩空间转换算法？有什么、算法、
				   
                

                


    
        
                  

        
        
                  

          

             
        
    
    
                  

    


                
                
                    
                        猜您喜欢
                    
                    
					 
								
								韩国警方无能李炯浩事变 被列为韩国三大
							
						
                        
   问题与AngularJS茉莉花测试 - 模块未找到茉莉花、未找到、模块、
     魔兽世界历史全剧情（详细点） 魔兽世界历史全部的剧情_魔兽世界93剧
     粗盐是什么盐 粗盐的功效和作用_粗盐可以炒菜吃吗 粗盐和海盐有
     一颗土豆将飞过地球 2011ES4号小行星（图）
     苏麻喇姑为什么没有承诺当妃子没有嫁给皇上?苏麻喇姑比康熙大几
     洪水来了怎么办？这些洪水自救方法一定要知道
                                                      
                                        

                
                
                
                
                    
                        精彩图集
                     
                    
                       
                    宇宙这么大，那么宇宙之外的是什么?会有什
                        
                    浩瀚宇宙有多大:宇宙到底有多大呢？是无边
                        
                    地球和仙女座的距离，我们来计算一下有多少
                        
                    磁星是宇宙中的贵族，至今仅发现20余颗，其磁
                        
                    感受一下白垩纪著名的恐龙灭绝事件，一代地
                        
                    物种演化离不开自然法则，但早已克服生存困
                        
                    人类在其他星球都能跳跃多少高度呢？接下来



            
            

    
    
        


   
    
        精彩推荐
        
        
 
 
    

    
        
 
 
    
    
        图片推荐
        
            
                    
                    蜜豆是什么豆子?夏日甜品，红豆创造而成_依
                
                    
                    神奇禁地百慕达三角洲，清点百般在此地爆发