分裂特殊条件下的列表成不同长度的部分条件下、长度、特殊、不同

2023-09-11 23:26:33 作者:原味网民

我需要划分不同的制造部件不均匀组的算法。的主要条件是该组中的最大数量之间的差和所有其他应尽可能低。对于

I need an algorithm of dividing different manufacturing parts in to uneven groups. The main condition is that difference between maximum number in the group and all others should be as low as possible. For

例如:

如果我们有列表 [1,3,4,11,12,19,20,21] ,我们决定应该在3个部分划分它应该分为 [1,3,4],[11,12],[19,20,21] 。在相同的情况下,如果我们决定在将其划分为4我们可以得到:

if we have list [1,3,4,11,12,19,20,21] and we decide that it should be divided in 3 parts it should be divided into [1,3,4],[11,12],[19,20,21]. In the same case if we decide to divide it in to 4 we would get :

 [1,3,4],[11],[12],[19,20,21].

为了澄清之间的组中和最大数量的所有其他差 - [1,3,4] = 4 - 1 + 4 - 3 + 4 - 4 = 4,[11] = 11 - 11 = 0,[12,19] = 19 - 12 + 19 - 19 = 7,[20,21] = 21 -20 + 21 - 21 = 1的总差额= 12。在其它可能的情况下,[1,3,4 ] = 4 - 1 + 4 - 3 + 4 - 4 = 4,[11,12,19] = 19 - 11 + 19 - 12 + 19 - 19 = 12,[20,21] = 21 - 20 + 21 - 21 = 0的总差额= 16,这是在性能计算。这是由于这样的事实,即拉尔数(再presenting例如强度)需要更换组(最弱)中最小的数。利用超强的一部分过于昂贵或重所以优化是必要的。

In order to clarify "difference between maximum number in the group and all others" - [1,3,4] = 4 - 1 + 4 - 3 + 4 - 4 = 4,[11] = 11 - 11 = 0 ,[12,19] = 19 - 12 + 19 - 19 = 7 ,[20,21] = 21 -20 + 21 - 21 = 1. Total difference = 12. In the other possible case [1,3,4] = 4 - 1 + 4 - 3 + 4 - 4 = 4,[11,12,19] = 19 - 11 + 19 - 12 + 19 - 19 = 12,[20,21] = 21 - 20 + 21 - 21 = 0. Total difference = 16. This is calculation of over performance. This is due to the fact that larges number (representing for example strength) need to replace smallest number in the group (weakest). Using super strong part would be too expensive or heavy so optimization is needed.

所以首先我想裁列表中的所有可能的组合,然后计算出组中和最大号的所有其他的组之间的差异。然后作为最终结果选择一个与最小最小差异。

So first I was thinking to slice the list in all possible combinations and then calculate the "difference between maximum number in the group and all others in the group". Then select as a final result the one with smallest minimum difference.

我想知道是否有一些建立在Python或的Spyder 或类似的功能。如果我需要写一个code能不能帮我?

I was wondering if there is some build in function in python or Spyder or similar. If I need to write a code could you help me?

我努力工作随机列表分为10,以重新应用在不同的情况。 L =排序(random.sample(范围(100),10))。

I'm trying to work on random list divided in to 10 in order to reapply it in different situations. l = sorted(random.sample(range(100), 10)).

推荐答案

根据更新后的意见,这听起来像你正在寻找的K-means算法,或者类似的事情,这将群集的列表元素融入基于不同的组从提出中心的距离(这是您的差值计算真正测量)。

Based on your updated comments, it sounds like you are looking for the K-Means algorithm, or similar things, that will cluster your list elements into distinct groups based on their distance from proposed centers (this is what your difference calculation is really measuring).

在您的标准,请注意,它从未有意义减去本身每一个亚组的最大的,因为其值始终为零的定义。因此,其实你看,最大负的每个元素的总和,在所有非最大要素(做什么用重复也需要回答的问题)。 K均值会做不同的事情(这将着眼于从点的平均每个点的距离),但在精神上是一致的。您可以修改K-手段来使用一组分数的概念,但我实在不明白,在集群产出方面的任何好处了 - 我需要看到某种数学证明的有关限制行为不同的标准被说服它的问题。

In your criterion, note that it never makes sense to subtract the max of each subgroup from itself, since this is always zero by definition. So really you're looking at the sum of the max minus each element, over all non-max elements (what to do with duplicates is also a question you need to answer). K-Means will do something different (it will look at every point's distance from the average of the points), but in spirit it's the same. You can modify k-means to use your notion of a group score, although I don't really see any benefit to that in terms of the clustering output -- I'd need to see some kind of math proofs about the limiting behavior of the different criteria to be convinced that it matters.

您可以合理方便的与 sklearn numpy的模块实现这一点:

You can achieve this reasonably easily with the sklearn and numpy modules:

from sklearn import cluster as cluster
import numpy as np

km = cluster.KMeans(n_clusters=4)
example_data = np.asarray([1,2,3, 11,12, 20,21,22, 30,35])[:,None]

km.fit(example_data)

然后再看看 km.labels _

In [65]: km.labels_
Out[65]: array([0, 0, 0, 3, 3, 1, 1, 1, 2, 2], dtype=int32)

您可以看到,这将放在一起 [1,2,3] [11,12] [20,21,22] [30,35] 。下面是一些code,实际上得到这个给你:

You can see that this would put together [1,2,3], [11, 12], [20, 21 , 22], [30, 35]. Below is some code that actually gets this for you:

In [74]: example_data.tolist()[0]
Out[74]: [1, 2, 3, 11, 12, 20, 21, 22, 30, 35]

In [75]: [[x for i,x in enumerate(example_data.tolist()[0]) if km.labels_[i] == j] 
          for j in range(km.n_clusters)]

Out[75]: [[1, 2, 3], [20, 21, 22], [30, 35], [11, 12]]

但是请注意,这是不完美的。这是一个迭代的方法不能保证收敛到任何真实的解决方案,并为离奇足够的输入数据,就可以得到离奇输出

But note that this is not perfect: it is an iterative method not guaranteed to converge to any "true" solution, and for bizarre enough input data, you can get bizarre output.

另外,你想要什么的更基本的了解是选择指数的整数 I [0] I [K] ,使得

Alternatively, a more basic understanding of what you want is to choose index integers i[0] through i[k], such that

sub_lists[j] = original_list[i[j]:i[j+1]] 

I [0] = 0 I [K + 1] 理解为其他一切列表。然后定义:

with i[0]=0 and i[k+1] understood to mean "everything else in the list." Then define:

sub_lens = [len(s) for s in sub_lists]
max_len  = max(sub_lens)
criterion(k, i[0], ..., i[k]) = max(max_len - s_len for s_len in sub_lens)

所以你的解决方案是参数的元组,(K,I [0],...,I [K]),你想选择的尽量减少上述前pression 标准

So a solution for you is a tuple of parameters, (k, i[0], ..., i[k]) and you want the choice that minimized the above expression criterion.

有关此问题的一个通用的解决方案是相当复杂的。但是,如果你愿意接受一个贪婪的解决方案,这将是除了最后的子列表非常均衡,很多的these解决方案就行了。

A generic solution for this problem is quite complicated. But if you're willing to accept a greedy solution that will be very balanced except for the final sublist, many of these solutions will do.