模拟撕裂了双在C#

2023-09-03 05:45:24 作者:拿命拼未来

我是一个32位的机器上运行,我可以确认长值可撕使用以下code段这很快命中。

 静态无效TestTearingLong()
        {
            System.Threading.Thread A =新System.Threading.Thread(的ThreadA);
            A.Start();

            System.Threading.Thread B =新System.Threading.Thread(ThreadB);
            B.Start();
        }

        静态ULONG s_x;

        静态无效的ThreadA()
        {
            INT I = 0;
            而(真)
            {
                s_x =(ⅰ&安培; 1)== 0? 0x0L:0xaaaabbbbccccddddL;
                我++;
            }
        }

        静态无效ThreadB()
        {
            而(真)
            {
                ULONG X = s_x;
                Debug.Assert的(X == 0x0L || x == 0xaaaabbbbccccddddL);
            }
        }
 

但是,当我尝试用双打类似的事情,我不能让任何撕裂。有谁知道为什么吗?据我可以告诉规范,只分配给一个浮动的原子。分配到一个双应该有撕裂的危险。

 静态双s_x;

    静态无效TestTearingDouble()
    {
        System.Threading.Thread A =新System.Threading.Thread(的ThreadA);
        A.Start();

        System.Threading.Thread B =新System.Threading.Thread(ThreadB);
        B.Start();
    }

    静态无效的ThreadA()
    {
        长I = 0;

        而(真)
        {
            s_x =((ⅰ&安培; 1)== 0)? 0.0:double.MaxValue;
            我++;

            如果(I%千万== 0)
            {
                Console.Out.WriteLine(的i =+ I);
            }
        }
    }

    静态无效ThreadB()
    {
        而(真)
        {
            双X = s_x;

            System.Diagnostics.Debug.Assert(X = = 0.0 || x == double.MaxValue);
        }
    }
 

解决方案

 静态双s_x;
 
数学建模 混合赌博模型

这是更难当你使用双来演示效果。 CPU使用专用指令来加载和存储双,分别FLD和FSTP。它是用非常容易的长的,因为没有单指令装入/存储一个64位整数中的32位模式。观察它,你需要有变量的地址不对齐因此它横跨CPU缓存行边界。

这绝不会发生在你使用的声明中,JIT编译器可以确保双正确对齐,存放在一个地址是8的倍数。你可以将其存储在一个类时,GC分配只对准一个字段到4在32位模式。但是,这是一个废话拍摄。

要做到这一点是通过故意错误调整的双重使用指针的最佳方式。放的不安全的在程序类的前面,使它看起来类似于这样:

 静态双* s_x;

    静态无效的主要(字串[] args){
        变种纪念品= Marshal.AllocCoTaskMem(100);
        s_x =(双*)((长)(MEM)+ 28);
        TestTearingDouble();
    }
ThreadA中:
            * s_x =((ⅰ&安培; 1)== 0)? 0.0:double.MaxValue;
ThreadB:
            双X = * s_x;
 

这仍然不能保证一个良好的错位(呵呵),因为没有办法控制的确切位置AllocCoTaskMem()将调整相对于CPU的缓存行的开头分配。它取决于你的CPU内核的高速缓存相关性(我的是酷睿i5)。你必须鼓捣偏移,我得到了价值28通过实验。该值应被4除尽的但不是由8真正模拟的GC堆的行为。不断增加8的值,直到你获得双倍横跨高速缓存行,并触发断言。

要使它不那么做作,你必须写一个程序,存储双一类的领域,并得到垃圾收集器在内存中四处移动它,所以它被对齐。有点难以拿​​出了的的示例程序,确保的这种情况。

另外请注意你的程序如何证明所谓有问题的假共享的。注释掉的start()方法调用的线程B,注意快多少线程A上运行。你看到的CPU保持高速缓存线的CPU核心之间一致的费用。分享在这里意,因为线程访问同一个变量。真实伪共享发生在当线程访问存储在同一高速缓存行不同的变量。这就是不然为什么对齐问题,你只能观察撕裂的双重当时它是一个高速缓存行和它的一部分是另一回事。

I'm running on a 32-bit machine and I'm able to confirm that long values can tear using the following code snippet which hits very quickly.

        static void TestTearingLong()
        {
            System.Threading.Thread A = new System.Threading.Thread(ThreadA);
            A.Start();

            System.Threading.Thread B = new System.Threading.Thread(ThreadB);
            B.Start();
        }

        static ulong s_x;

        static void ThreadA()
        {
            int i = 0;
            while (true)
            {
                s_x = (i & 1) == 0 ? 0x0L : 0xaaaabbbbccccddddL;
                i++;
            }
        }

        static void ThreadB()
        {
            while (true)
            {
                ulong x = s_x;
                Debug.Assert(x == 0x0L || x == 0xaaaabbbbccccddddL);
            }
        }

But when I try something similar with doubles, I'm not able to get any tearing. Does anyone know why? As far as I can tell from the spec, only assignment to a float is atomic. The assignment to a double should have a risk of tearing.

    static double s_x;

    static void TestTearingDouble()
    {
        System.Threading.Thread A = new System.Threading.Thread(ThreadA);
        A.Start();

        System.Threading.Thread B = new System.Threading.Thread(ThreadB);
        B.Start();
    }

    static void ThreadA()
    {
        long i = 0;

        while (true)
        {
            s_x = ((i & 1) == 0) ? 0.0 : double.MaxValue;
            i++;

            if (i % 10000000 == 0)
            {
                Console.Out.WriteLine("i = " + i);
            }
        }
    }

    static void ThreadB()
    {
        while (true)
        {
            double x = s_x;

            System.Diagnostics.Debug.Assert(x == 0.0 || x == double.MaxValue);
        }
    }

解决方案

static double s_x;

It is much harder to demonstrate the effect when you use a double. The CPU uses dedicated instructions to load and store a double, respectively FLD and FSTP. It is much easier with long since there is no single instruction that load/stores a 64-bit integer in 32-bit mode. To observe it you need to have the variable's address misaligned so it straddles the cpu cache line boundary.

That will never happen with the declaration you used, the JIT compiler ensures that the double is aligned properly, stored at an address that's a multiple of 8. You could store it in a field of a class, the GC allocator only aligns to 4 in 32-bit mode. But that's a crap shoot.

Best way to do it is by intentionally mis-aligning the double by using a pointer. Put unsafe in front of the Program class and make it look similar to this:

    static double* s_x;

    static void Main(string[] args) {
        var mem = Marshal.AllocCoTaskMem(100);
        s_x = (double*)((long)(mem) + 28);
        TestTearingDouble();
    }
ThreadA:
            *s_x = ((i & 1) == 0) ? 0.0 : double.MaxValue;
ThreadB:
            double x = *s_x;

This still won't guarantee a good misalignment (hehe) since there's no way to control exactly where AllocCoTaskMem() will align the allocation relative to the start of the cpu cache line. And it depends on the cache associativity in your cpu core (mine is a Core i5). You'll have to tinker with the offset, I got the value 28 by experimentation. The value should be divisible by 4 but not by 8 to truly simulate the GC heap behavior. Keep adding 8 to the value until you get the double to straddle the cache line and trigger the assert.

To make it less artificial you'll have to write a program that stores the double in field of a class and get the garbage collector to move it around in memory so it gets misaligned. Kinda hard to come up with a sample program that ensures this happens.

Also note how your program can demonstrate a problem called false sharing. Comment out the Start() method call for thread B and note how much faster thread A runs. You are seeing the cost of the cpu keeping the cache line consistent between the cpu cores. Sharing is intended here since the threads access the same variable. Real false sharing happens when threads access different variables that are stored in the same cache line. This is otherwise why alignment matters, you can only observe the tearing for a double when part of it is in one cache line and part of it is in another.

相关推荐