遍历大小为k的不同子集子集、遍历、大小、不同

2023-09-11 22:44:10 作者:绣花男神

我有个整数数组(不一定是截然不同的!),我想遍历大小为k的所有子集。不过,我想排除所有重复的子集。

I have an array of n integers (not necessarily distinct!) and I would like to iterate over all subsets of size k. However I'd like to exclude all duplicate subsets.

例如。

array = {1,2,2,3,3,3,3}, n = 7, k = 2

那么我想遍历(各一次)的子集:

then the subsets I want to iterate over (each once) are:

{1,2},{1,3},{2,2},{2,3},{3,3}

什么是一个高效的算法,这样做呢? 是一个递归的方法最有效的/优雅?

What is an efficient algorithm for doing this? Is a recursive approach the most efficient/elegant?

如果你有一个特定语言的答案,我使用C ++。

In case you have a language-specific answer, I'm using C++.

推荐答案

,相同的(或几乎相同),它被用于一组词典编纂顺序唯一值的所产生的组合算法可用于产生一个多重的组合在字典顺序。这样做,这样避免了必要性删除重复,这是可怕的价格昂贵,而且也避免了维护所有生成的组合的必要性。它要求值的原始列表进行排序。

The same (or almost the same) algorithm which is used to generated combinations of a set of unique values in lexicographical order can be used to generate combinations of a multiset in lexicographical order. Doing it this way avoids the necessity to deduplicate, which is horribly expensive, and also avoids the necessity of maintaining all the generated combinations. It does require that the original list of values be sorted.

下面简单的实现找到下的 K 的-combination中的 N 的价值平均一个多重的(和最坏情况下的)时间为O( N )。它期望两个范围:第一范围是一个排序的 K 的-combination,和第二范围是排序的多集。 (如果两个范围是未分类还是在第一个范围值不构成子(多)设定第二范围,则行为是不确定的;没有健全的检查由)

The following simple implementation finds the next k-combination of a multiset of n values in average (and worst-case) time O(n). It expects two ranges: the first range is a sorted k-combination, and the second range is the sorted multiset. (If either range is unsorted or the values in first range do not constitute a sub(multi)set of the second range, then the behaviour is undefined; no sanity checks are made.)

仅从第二范围的结束迭代器实际使用,但是我觉得做了调用约定有点奇怪。

Only the end iterator from the second range is actually used, but I thought that made the calling convention a bit odd.

template<typename BidiIter, typename CBidiIter,
         typename Compare = std::less<typename BidiIter::value_type>>
int next_comb(BidiIter first, BidiIter last,
              CBidiIter /* first_value */, CBidiIter last_value,
              Compare comp=Compare()) {
  /* 1. Find the rightmost value which could be advanced, if any */
  auto p = last;
  while (p != first && !comp(*(p - 1), *--last_value)) --p;
  if (p == first) return false;
  /* 2. Find the smallest value which is greater than the selected value */
  for (--p; comp(*p, *(last_value - 1)); --last_value) { }
  /* 3. Overwrite the suffix of the subset with the lexicographically smallest
   *    sequence starting with the new value */
  while (p != last) *p++ = *last_value++;
  return true;
}

应当清楚的是,步骤1和2合并的化妆至多为O( N 的)比较,因为每一个的 N 的值被用在至多一个比较。步骤3张,最多O( K 的)价值观,我们知道的 K 的和乐; N 的

It should be clear that steps 1 and 2 combined make at most O(n) comparisons, because each of the n values is used in at most one comparison. Step 3 copies at most O(k) values, and we know that k≤n.

此可以改善到O( K 的)中的情况下被重复任何值,通过保持当前组合作为容器迭代入值列表,而不是实际的数值。这也将避免复制值,在额外的解引用的成本。如果除了我们缓存器关联的每个值迭代器迭代器的下一个最大的价值一审的功能,我们可以排除第2步,降低算法O( K 的),即使是重复的值。如果有大量的重复序列和比较是昂贵,可能是值得的。

This could be improved to O(k) in the case where no values are repeated, by maintaining the current combination as a container of iterators into the value list rather than actual values. This would also avoid copying values, at the cost of extra dereferences. If in addition we cache the function which associates each value iterator with an iterator to the first instance of next largest value, we could eliminate Step 2 and reduce the algorithm to O(k) even for repeated values. That might be worthwhile if there are a large number of repeats and comparisons are expensive.

下面是一个简单的使用例子:

Here's a simple use example:

std::vector<int> values = {1,2,2,3,3,3,3};
/* Since that's sorted, the first subset is just the first k values */
const int k = 2;
std::vector<int> subset{values.cbegin(), values.cbegin() + k};

/* Print each combination */
do {
  for (auto const& v : subset) std::cout << v << ' ';
  std::cout << '\n';
} while (next_comb(subset.begin(),  subset.end(),
                   values.cbegin(), values.cend()));

住在 coliru