我想生成比特重presentation所有可能的组合(不重复)。我不能使用像升压或STL :: next_combination任何库 - 它必须是我自己的code(计算时间是非常重要的)
I'd like to generate all possible combination (without repetitions) in bit representation. I can't use any library like boost or stl::next_combination - it has to be my own code (computation time is very important).
下面是我的code(从那些计算器用户修改):
Here's my code (modified from ones StackOverflow user):
int combination = (1 << k) - 1;
int new_combination = 0;
int change = 0;
while (true)
{
// return next combination
cout << combination << endl;
// find first index to update
int indexToUpdate = k;
while (indexToUpdate > 0 && GetBitPositionByNr(combination, indexToUpdate)>= n - k + indexToUpdate)
indexToUpdate--;
if (indexToUpdate == 1) change = 1; // move all bites to the left by one position
if (indexToUpdate <= 0) break; // done
// update combination indices
new_combination = 0;
for (int combIndex = GetBitPositionByNr(combination, indexToUpdate) - 1; indexToUpdate <= k; indexToUpdate++, combIndex++)
{
if(change)
{
new_combination |= (1 << (combIndex + 1));
}
else
{
combination = combination & (~(1 << combIndex));
combination |= (1 << (combIndex + 1));
}
}
if(change) combination = new_combination;
change = 0;
}
其中, N
- 所有元素, K
- 组合元素的数量。
GetBitPositionByNr
- 第k位的复位位置。
GetBitPositionByNr(13,2)= 3
13的原因是1101和第二位是在第三的位置。
where n
- all elements, k
- number of elements in combination.
GetBitPositionByNr
- return position of k-th bit.
GetBitPositionByNr(13,2) = 3
cause 13 is 1101 and second bit is on third position.
这给了我正确的输出 N = 4,K = 2
是:
It gives me correct output for n=4, k=2
which is:
0011 (3 - decimal representation - printed value)
0101 (5)
1001 (9)
0110 (6)
1010 (10)
1100 (12)
此外,它给了我正确的输出 K = 1
和 K = 4
,但是给了我错了outpu为 K = 3
是:
Also it gives me correct output for k=1
and k=4
, but gives me wrong outpu for k=3
which is:
0111 (7)
1011 (11)
1011 (9) - wrong, should be 13
1110 (14)
我想这个问题是在内部,而条件(第二),但我不知道如何解决这个问题。
I guess the problem is in inner while condition (second) but I don't know how to fix this.
也许你们当中有些人知道越好(快)算法做要我要达到什么目的?它不能使用更多的内存(阵列)。
Maybe some of you know better (faster) algorithm to do want I want to achieve? It can't use additional memory (arrays).
下面是code对ideone运行: IDEONE
Here is code to run on ideone: IDEONE
如果有疑问,用蛮力。唉,生成所有的的变化与重复,的再过滤掉不必要的模式:
When in doubt, use brute force. Alas, generate all variations with repetition, then filter out the unnecessary patterns:
unsigned bit_count(unsigned n)
{
unsigned i = 0;
while (n) {
i += n & 1;
n >>= 1;
}
return i;
}
int main()
{
std::vector<unsigned> combs;
const unsigned N = 4;
const unsigned K = 3;
for (int i = 0; i < (1 << N); i++) {
if (bit_count(i) == K) {
combs.push_back(i);
}
}
// and print 'combs' here
}
编辑:其他人已经指出,没有过滤和蛮力解决办法,但我还是想给大家介绍一下这个算法有一些提示:
Someone else already pointed out a solution without filtering and brute force, but I'm still going to give you a few hints about this algorithm:
大多数编译器提供某种形式的内在的人口数的功能。我知道,海湾合作委员会,并锵具有 __ builtin_popcount()
。使用这种内在的功能,我能加倍的code的速度。
most compilers offer some sort of intrinsic population count function. I know of GCC and Clang which have __builtin_popcount()
. Using this intrinsic function, I was able to double the speed of the code.
既然你似乎是工作在GPU上,你可以并行化code。我已经做到了用C ++ 11的标准线程设施,并且我已经成功地计算所有32位重复进行任意选择的popcounts 1,16和7.1秒19日我的8核英特尔机。
Since you seem to be working on GPUs, you can parallelize the code. I have done it using C++11's standard threading facilities, and I've managed to compute all 32-bit repetitions for arbitrarily-chosen popcounts 1, 16 and 19 in 7.1 seconds on my 8-core Intel machine.
下面是最终的code我已经写了:
Here's the final code I've written:
#include <vector>
#include <cstdio>
#include <thread>
#include <utility>
#include <future>
unsigned popcount_range(unsigned popcount, unsigned long min, unsigned long max)
{
unsigned n = 0;
for (unsigned long i = min; i < max; i++) {
n += __builtin_popcount(i) == popcount;
}
return n;
}
int main()
{
const unsigned N = 32;
const unsigned K = 16;
const unsigned N_cores = 8;
const unsigned long Max = 1ul << N;
const unsigned long N_per_core = Max / N_cores;
std::vector<std::future<unsigned>> v;
for (unsigned core = 0; core < N_cores; core++) {
unsigned long core_min = N_per_core * core;
unsigned long core_max = core_min + N_per_core;
auto fut = std::async(
std::launch::async,
popcount_range,
K,
core_min,
core_max
);
v.push_back(std::move(fut));
}
unsigned final_count = 0;
for (auto &fut : v) {
final_count += fut.get();
}
printf("%u\n", final_count);
return 0;
}