差异浮动X86和X64之间的点算术算术、差异

2023-09-09 21:13:55 作者：脑残也是残疾的一种

我偶然发现在浮点算术的MS VS 2010年间完成的方式的不同建立了x86和x64（同64位的机器上都执行）。

I stumbled upon a difference in the way floating point arithmetics are done between MS VS 2010 builds for x86 and x64 (both executed on the same 64 bit machine).

这是一个减少code样品：

This is a reduced code sample:

float a = 50.0f;
float b = 65.0f;
float c =  1.3f;
float d = a*c;
bool bLarger1 = d<b;
bool bLarger2 = (a*c)<b;

布尔bLarger1总是假（d设定为65.0在两个版本）。可变bLarger2是假的，但x64的适用于x86的！

The boolean bLarger1 is always false (d is set to 65.0 in both builds). Variable bLarger2 is false for x64 but true for x86!

我非常清楚的浮点算术和正在发生的舍入效果。我也知道，32位有时使用不同的指令进行浮点运算比64位版本。但在这种情况下，我错过了一些信息。

I am well aware of floating point arithmetics and the rounding effects taking place. I also know that 32 bit sometimes uses different instructions for floating operations than 64 bit builds. But in this case I am missing some information.

为什么会出现在首位bLarger1和bLarger2之间的discrepency？为什么在32位版本只present？

Why is there a discrepency between bLarger1 and bLarger2 on the first place? Why is it only present on the 32 bit build?

推荐答案

在此EX pression问题关键在于：

The issue hinges on this expression:

bool bLarger2 = (a*c)<b;

我看了看VS2008下产生的，没有VS2010到手的code。对于64位的code是：

I looked at the code generated under VS2008, not having VS2010 to hand. For 64 bit the code is:


000000013FD51100  movss       xmm1,dword ptr [a] 
000000013FD51106  mulss       xmm1,dword ptr [c] 
000000013FD5110C  movss       xmm0,dword ptr [b] 
000000013FD51112  comiss      xmm0,xmm1

对于32位的code是：

For 32 bit the code is:


00FC14DC  fld         dword ptr [a] 
00FC14DF  fmul        dword ptr [c] 
00FC14E2  fld         dword ptr [b] 
00FC14E5  fcompp

所以在32位的计算中使用x87单元进行，并且在64位它是由64位单位进行。

So under 32 bit the calculation is performed in the x87 unit, and under 64 bit it is performed by the x64 unit.

和这里的不同之处在于使用x87操作都执行到比单precision更高。默认情况下，执行计算双precision。在另一方面上证所单元操作是纯粹的单precision计算。

And the difference here is that the x87 operations are all performed to higher than single precision. By default the calculations are performed to double precision. On the other hand the SSE unit operations are pure single precision calculations.

您可以说服32位为单位来执行所有的计算，以单precision精度是这样的：

You can persuade the 32 bit unit to perform all calculations to single precision accuracy like this:

_controlfp(_MCW_PC, _PC_24);

当您添加到您的32位程序，你会发现，布尔值都设置为false。

When you add that to your 32 bit program you will find that the booleans are both set to false.

目前的方式，使用x87和SSE浮点单元工作的一个根本区别。使用x87设备使用的单，双precision类型相同的指令。数据被加载到在的x87 FPU堆栈寄存器，和那些寄存器总是10字节英特尔延长。您可以使用浮点控制字控制precision。但是编译器写入是无知的状态。说明

There is a fundamental difference in the way that the x87 and SSE floating point units work. The x87 unit uses the same instructions for both single and double precision types. Data is loaded into registers in the x87 FPU stack, and those registers are always 10 byte Intel extended. You can control the precision using the floating point control word. But the instructions that the compiler writes are ignorant of that state.

在另一方面，上证所单元采用了单人和双人precision操作不同的指令。这意味着，编译器可以发出code，它是完全控制的计算precision。

On the other hand, the SSE unit uses different instructions for operations on single and double precision. Which means that the compiler can emit code that is in full control of the precision of the calculation.

因此，使用x87单位是这里的坏家伙。你也许可以试着说服你的编译器发出的SSE指令甚至是32位的目标。当然，当我编译VS2013在你的code我发现，32位和64位的目标发出的SSE指令。

So, the x87 unit is the bad guy here. You can maybe try to persuade your compiler to emit SSE instructions even for 32 bit targets. And certainly when I compiled your code under VS2013 I found that both 32 and 64 bit targets emitted SSE instructions.

上一篇：如何实现平滑的切线空间法线？法线、切线、平滑、如何实现

下一篇：在 CSS 中的悬停覆盖图像上图像、CSS

相关推荐