使用哪个 std::sync::atomic::Ordering?sync、std、Ordering、atomic

2023-09-07 16:21:31 作者:海誓山盟只是年少无知

std::sync::atomic::AtomicBool 的所有方法 采用我以前没有使用过的内存排序(Relaxed、Release、Acquire、AcqRel 和 SeqCst).在什么情况下应该使用这些值?该文档使用了我不太了解的令人困惑的加载"和存储"术语.例如:

All the methods of std::sync::atomic::AtomicBool take a memory ordering (Relaxed, Release, Acquire, AcqRel, and SeqCst), which I have not used before. Under what circumstances should these values be used? The documentation uses confusing "load" and "store" terms which I don’t really understand. For example:

生产者线程改变由 Mutex 持有的一些状态a>,然后调用 AtomicBool::compare_and_swap(false, true, ordering)(合并失效),如果它交换了,则将invalidate"消息发布到并发队列(例如 mpsc 或 winapi PostMessage).消费者线程重置 AtomicBool,从队列中读取,并读取 Mutex 持有的状态.生产者可以使用宽松排序,因为它前面有一个互斥锁,还是必须使用发布?消费者可以使用 store(false, Relaxed),还是必须使用 compare_and_swap(true, false, Acquire) 来接收来自互斥锁的更改?

A producer thread mutates some state held by a Mutex, then calls AtomicBool::compare_and_swap(false, true, ordering) (to coalesce invalidations), and if it swapped, posts an "invalidate" message to a concurrent queue (e.g. mpsc or a winapi PostMessage). A consumer thread resets the AtomicBool, reads from the queue, and reads the state held by the Mutex. Can the producer use Relaxed ordering because it is preceded by a mutex, or must it use Release? Can the consumer use store(false, Relaxed), or must it use compare_and_swap(true, false, Acquire) to receive the changes from the mutex?

如果生产者和消费者共享一个 RefCell 而不是 Mutex?

What if the producer and consumer share a RefCell instead of a Mutex?

推荐答案

我不是这方面的专家,而且真的很复杂,所以请随时批评我的帖子.正如 mdh.heydari 所指出的,cppreference.com 有 更好的订购文档比 Rust(C++ 有几乎相同的 API).

I'm not an expert on this, and it's really complicated, so please feel free to critique my post. As pointed out by mdh.heydari, cppreference.com has much better documentation of orderings than Rust (C++ has an almost identical API).

您需要在生产者中使用发布"排序,在消费者中使用获取"排序.这可确保在 AtomicBool 设置为 true 之前发生数据突变.

You'd need to use "release" ordering in your producer and "acquire" ordering in your consumer. This ensures that the data mutation occurs before the AtomicBool is set to true.

如果您的队列是异步的,那么消费者将需要不断尝试循环读取它,因为生产者可能会在设置 AtomicBool 和将某些内容放入队列之间被中断.

If your queue is asynchronous, then the consumer will need to keep trying to read from it in a loop, since the producer could get interrupted between setting the AtomicBool and putting something in the queue.

如果生产者代码可能在客户端运行之前运行多次,那么您不能使用 RefCell,因为它们可能会在客户端读取数据时改变数据.否则没关系.

If the producer code might run multiple times before client runs, then you can't use RefCell because they could mutate the data while the client is reading it. Otherwise it's fine.

还有其他更好、更简单的方法可以实现这种模式,但我假设您只是将其作为示例.

There are other better and simpler ways to implement this pattern, but I assume you were just giving it as an example.

当原子操作发生时,不同的顺序与另一个线程看到的情况有关.编译器和 CPU 通常都允许重新排序指令以优化代码,并且排序会影响它们被允许重新排序指令的程度.

The different orderings have to do with what another thread sees happen when an atomic operation occurs. Compilers and CPUs are normally both allowed to reorder instructions in order to optimize code, and the orderings effect how much they're allowed to reorder instructions.

您可以始终使用 SeqCst,这基本上可以保证每个人都会看到该指令相对于其他指令在您放置的任何位置都发生了,但在某些情况下,如果您指定限制较少的顺序,那么 LLVMCPU 可以更好地优化你的代码.

You could just always use SeqCst, which basically guarantees everyone will see that instruction as having occurred wherever you put it relative to other instructions, but in some cases if you specify a less restrictive ordering then LLVM and the CPU can better optimize your code.

您应该将这些排序视为应用于内存位置(而不是应用于指令).

You should think of these orderings as applying to a memory location (instead of applying to an instruction).

除了对内存位置的任何修改是原子的之外没有任何限制(因此它要么完全发生,要么根本不发生).如果单个线程检索/设置的值无关紧要,只要它们是原子的,这对于计数器之类的东西来说很好.

There are no constraints besides any modification to the memory location being atomic (so it either happens completely or not at all). This is fine for something like a counter if the values retrieved by/set by individual threads don't matter as long as they're atomic.

此约束表示应用获取"后代码中发生的任何变量读取都不能重新排序以发生在它之前.因此,假设在您的代码中,您读取了一些共享内存位置并获取值 X,该值在时间 T 存储在该内存位置,然后应用获取"约束.您在应用约束后读取的任何内存位置都将具有它们在时间 T 或更晚时的值.

This constraint says that any variable reads that occur in your code after "acquire" is applied can't be reordered to occur before it. So, say in your code you read some shared memory location and get value X, which was stored in that memory location at time T, and then you apply the "acquire" constraint. Any memory locations that you read from after applying the constraint will have the value they had at time T or later.

这可能是大多数人直观地期望发生的事情,但是因为只要不改变结果,CPU 和优化器就可以对指令重新排序,因此不能保证.

This is probably what most people would expect to happen intuitively, but because a CPU and optimizer are allowed to reorder instructions as long as they don't change the result, it isn't guaranteed.

为了使获取"有用,它必须与释放"配对,否则不能保证另一个线程不会重新排序应该在时间 T 发生的写指令 到更早的时间.

In order for "acquire" to be useful, it has to be paired with "release", because otherwise there's no guarantee that the other thread didn't reorder its write instructions that were supposed to occur at time T to an earlier time.

获取-读取您正在寻找的标志值意味着您不会在其他地方看到一个过时的值,该值实际上是在释放存储到标志之前被写入更改的.

Acquire-reading the flag value you're looking for means you won't see a stale value somewhere else that was actually changed by a write before the release-store to the flag.

此约束表示在应用释放"之前发生在代码中的任何变量写入都不能重新排序以发生在它之后.因此,假设在您的代码中写入一些共享内存位置,然后在时间 T 设置一些内存位置 t,然后应用释放"约束.在应用发布"之前出现在代码中的任何写入都保证在它之前发生.

This constraint says that any variable writes that occur in your code before "release" is applied can't be reordered to occur after it. So, say in your code you write to a few shared memory locations and then set some memory location t at time T, and then you apply the "release" constraint. Any writes that appear in your code before "release" is applied are guaranteed to have occurred before it.

同样,这是大多数人直观地期望发生的事情,但不能毫无限制地保证.

Again, this is what most people would expect to happen intuitively, but it isn't guaranteed without constraints.

如果尝试读取值 X 的其他线程不使用获取",则不能保证看到与其他变量值更改相关的新值.所以它可以获得新值,但它可能看不到任何其他共享变量的新值.另外请记住,测试是困难.某些硬件实际上不会显示带有一些不安全代码的重新排序,因此可能无法检测到问题.

If the other thread trying to read value X doesn't use "acquire", then it isn't guaranteed to see the new value with respect to changes in other variable values. So it could get the new value, but it might not see new values for any other shared variables. Also keep in mind that testing is hard. Some hardware won't in practice show re-ordering with some unsafe code, so problems can go undetected.

Jeff Preshing 对获取和释放语义写了一个很好的解释,所以如果不清楚,请阅读.

Jeff Preshing wrote a nice explanation of acquire and release semantics, so read that if this isn't clear.

这会同时执行 AcquireRelease 排序(即两个限制都适用).我不确定什么时候有必要这样做——如果一些 Release、一些 Acquire 和一些两者兼而有之,这在 3 个或更多线程的情况下可能会有所帮助,但我我不太确定.

This does both Acquire and Release ordering (ie. both restrictions apply). I'm not sure when this is necessary - it might be helpful in situations with 3 or more threads if some Release, some Acquire, and some do both, but I'm not really sure.

这是最严格的,因此也是最慢的选项.它强制内存访问似乎以一个与每个线程相同的顺序发生.这需要 x86 上的 MFENCE 指令对所有原子变量的写入(完整的内存屏障,包括 StoreLoad),而较弱的排序则不需要.(SeqCst 加载在 x86 上不需要屏障,如您在 这个 C++ 编译器输出.)

This is most restrictive and, therefore, slowest option. It forces memory accesses to appear to occur in one, identical order to every thread. This requires an MFENCE instruction on x86 on all writes to atomic variables (full memory barrier, including StoreLoad), while the weaker orderings don't. (SeqCst loads don't require a barrier on x86, as you can see in this C++ compiler output.)

Read-Modify-Write 访问,如原子增量或比较和交换,是在 x86 上使用 locked 指令完成的,这些指令已经是完整的内存屏障.如果您完全关心在非 x86 目标上编译为高效代码,那么尽可能避免使用 SeqCst 是有意义的,即使对于原子读取-修改-写入操作也是如此.在某些情况下需要它.

Read-Modify-Write accesses, like atomic increment, or compare-and-swap, are done on x86 with locked instructions, which are already full memory barriers. If you care at all about compiling to efficient code on non-x86 targets, it makes sense to avoid SeqCst when you can, even for atomic read-modify-write ops. There are cases where it's needed, though.

有关原子语义如何转化为 ASM 的更多示例,请参阅 更多关于 C++ 原子变量的简单函数.我知道这是一个 Rust 问题,但它应该具有与 C++ 基本相同的 API.Godbolt 可以针对 x86、ARM、ARM64 和 PowerPC.有趣的是,ARM64 有 load-acquire (ldar) 和 store-release (stlr) 指令,因此它并不总是需要使用单独的屏障指令.

For more examples of how atomic semantics turn into ASM, see this larger set of simple functions on C++ atomic variables. I know this is a Rust question, but it's supposed to have basically the same API as C++. godbolt can target x86, ARM, ARM64, and PowerPC. Interestingly, ARM64 has load-acquire (ldar) and store-release (stlr) instructions, so it doesn't always have to use separate barrier instructions.

顺便说一句,默认情况下 x86 CPU 始终是强排序"的,这意味着它们始终表现得好像至少设置了 AcqRel 模式.因此,对于 x86,排序"只会影响 LLVM 优化器的行为方式.另一方面,ARM 是弱有序的.Relaxed 默认设置为允许编译器完全自由地重新排序,并且在弱排序 CPU 上不需要额外的屏障指令.

By the way, x86 CPUs are always "strongly ordered" by default, which means they always act as if at least AcqRel mode was set. So for x86 "ordering" only affects how LLVM's optimizer behaves. ARM, on the other hand, is weakly ordered. Relaxed is set by default, to allow the compiler full freedom to reorder things, and to not require extra barrier instructions on weakly-ordered CPUs.