如何计算在一个32位整数集的比特数?整数

2023-09-10 22:20:41 作者:人生如戏,全靠演技。

8位重presenting 7号是这样的:

8 bits representing the number 7 look like this:

00000111

三位设置。

Three bits are set.

哪些算法来确定比特组中的32位的整数的数目

What are algorithms to determine the number of set bits in a 32-bit integer?

推荐答案

这就是所谓的海明重量 popcount或侧身除。

This is known as the 'Hamming Weight', 'popcount' or 'sideways addition'.

在'最好'的算法实际上取决于你是哪个CPU和你的使用模式是什么。

The 'best' algorithm really depends on which CPU you are on and what your usage pattern is.

某些CPU有一个内置的指令做和其他人对位向量起作用的并行指令。并行指令(像86的 POPCNT ,在那里它支持的CPU)几乎肯定会最快。其他一些体系结构可能有一个缓慢的指令,微codeD环实现,测试周期每一个位(引证需要的)。

Some CPUs have a single built-in instruction to do it and others have parallel instructions which act on bit vectors. The parallel instructions (like x86's popcnt, on CPUs where it's supported) will almost certainly be fastest. Some other architectures may have a slow instruction implemented with a microcoded loop that tests a bit per cycle (citation needed).

一个pre-填充查表方法可以非常快,如果你的CPU有一个大的缓存和/或你正在做大量的这些指令在紧密循环。然而,它可以承受,因为一个缓存未命中,那里的CPU已经获取了一些表从主内存为代价的。

A pre-populated table lookup method can be very fast if your CPU has a large cache and/or you are doing lots of these instructions in a tight loop. However it can suffer because of the expense of a 'cache miss', where the CPU has to fetch some of the table from main memory.

如果你知道你的字节将主要的0或大部分1的则有非常有效的算法这些场景。

If you know that your bytes will be mostly 0's or mostly 1's then there are very efficient algorithms for these scenarios.

我相信一个很好的通用算法如下,被称为并联或可变precision SWAR算法。我有恩$ P $在C类伪语言pssed这一点,你可能需要调整它的工作特定语言(例如,使用uint32_t的为C ++和>>>在Java中):

I believe a very good general purpose algorithm is the following, known as 'parallel' or 'variable-precision SWAR algorithm'. I have expressed this in a C-like pseudo language, you may need to adjust it to work for a particular language (e.g. using uint32_t for C++ and >>> in Java):

int NumberOfSetBits(int i)
{
     // Java: use >>> instead of >>
     // C or C++: use uint32_t
     i = i - ((i >> 1) & 0x55555555);
     i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
     return (((i + (i >> 4)) & 0x0F0F0F0F) * 0x01010101) >> 24;
}

这有任何的讨论的算法最好的最坏情况下的行为,因此将有效地处理任何使用模式或价值观,你扔掉它。

This has the best worst-case behaviour of any of the algorithms discussed, so will efficiently deal with any usage pattern or values you throw at it.

此按位SWAR算法可以并行进行的一次多个向量元素一个整数寄存器中来完成,而不是,对CPU的SIMD的加速,但没有可用的popcount指令​​。 (如X86-64 code,有对任何CPU,而不仅仅是Nehalem处理器或更高版本上运行。)

This bitwise-SWAR algorithm could parallelize to be done in multiple vector elements at once, instead of in a single integer register, for a speedup on CPUs with SIMD but no usable popcount instruction. (e.g. x86-64 code that has to run on any CPU, not just Nehalem or later.)

然而,要使用的矢量指令popcount的最佳方式通常是通过使用可变洗牌做表查找为4比特并行地在每个字节的时间。 (4位索引16入口表在载体寄存器中保存)。

However, the best way to use vector instructions for popcount is usually by using a variable-shuffle to do a table-lookup for 4 bits at a time of each byte in parallel. (The 4 bits index a 16 entry table held in a vector register).

在英特尔的CPU,硬件64位popcnt指令可以超越的 SSSE3 PSHUFB 位并行执行通过有关的一个因素2,但只有如果你的编译器得到恰到好处。否则,上证所可以出来显著领先。较新的编译器版本都知道href="http://stackoverflow.com/a/25089720/224132"> POPCNT假依赖的