如何并行比较两个以上的号码?号码、两个

2023-09-11 04:36:00 作者:北野

是否有可能比对数字在一个指令使用SSE4比较多?

Is it possible to compare more than a pair of numbers in one instruction using SSE4?

Intel参考说,下列有关PCMPGTQ

Intel Reference says the following about PCMPGTQ

PCMPGTQ - 比较数据打包的时间大于

PCMPGTQ — Compare Packed Data for Greater Than

执行一个SIMD比较在目的地填充四字   操作数(第一个操作数)与源操作数(第二个操作数)。如果   在第一个(目标)的操作数中的数据元素大于   在第二(源)操作数对应的元件,在   相应的设置为全1的目标数据元素;   否则,将其设置为0。

Performs an SIMD compare for the packed quadwords in the destination operand (first operand) and the source operand (second operand). If the data element in the first (destination) operand is greater than the corresponding element in the second (source) operand, the corresponding data element in the destination is set to all 1s; otherwise, it is set to 0s.

这是不是真的是我想要的,因为我希望能够决定哪些整数是更大的,哪些是在载体较小。

which is not really what I want because I want to be able to decide which integers are greater and which are smaller in the vector.

例如,如果我需要比较

32 with 45
13 with 78
44 with 12
99 with 66

我正打算把 [32,13,44,99] 在一个向量和 [45,78,12,66] 在另一个载体,在一个指令使用SSE4对它们进行比较,并有 [0,0,1,1] 的结果(0 - 更低, - 更大)

I was planning to put [32, 13, 44, 99] in one vector and [45, 78, 12, 66] in another vector and compare them using SSE4 in one instruction, and have [0, 0, 1, 1] as result (0 - less, 1 - greater)

但似乎这不是什么PCMPGTQ一样。关于如何使用并行在这个级别的任何建议,以加速比这种比较?

But it seems this is not what PCMPGTQ does. Any suggestions on how to use parallelism at this level to speedup this comparison?

推荐答案

我认为实际上是什么 PCMPGT 家庭经营的呢。后缀指定元素的大小 - B 8位元素,是W 16位元素, D 32位元素,问: 64位元素。所以,如果你想比较4 32位数字一次,使用 PCMPGTD 128位向量参数。请参见此页伪$ C $这些运codeS的C说明。

I believe that is actually what the PCMPGT family of operators does. The suffix specifies the size of the elements - B for 8-bit elements, W for 16-bit elements, D for 32-bit elements, Q for 64-bit elements. So, if you want to compare 4 32-bit numbers at once, use PCMPGTD with 128-bit vector arguments. See this page for a pseudocode description of these opcodes.

他们不只是写 1 0 ,虽然;他们写全1或全零到每一个元素,使比较 0x1234567887654321 0x8765432112345678 使用 PCMPGTB 应该给 0x0000FFFFFFFF0000

They don't write just 1 or 0, though; they write all-ones or all-zeroes to each element, so that comparing 0x1234567887654321 against 0x8765432112345678 using PCMPGTB should give 0x0000FFFFFFFF0000.

这个英特尔白皮书给出了一个精巧的例子,执行操作的 A [1] =(A [1]> B [I])? A [1]:B [I] (即 A [1] = MAX(A [1],B [I]))使用向量运算。

This Intel white paper gives a neat example of performing the operation a[i] = (a[i] > b[i]) ? a[i] : b[i] (i.e. a[i] = max(a[i], b[i])) using vector operations.