在 Java 中,我可以依靠引用分配是原子的来实现写入时复制吗?入时、来实现、原子、分配

2023-09-07 16:04:57 作者:胖可爱

如果我在多线程环境中有一个未同步的 java 集合,并且我不想强制集合的读取器同步[1],那么我可以同步写入器并使用引用赋值的原子性可行吗?比如:

If I have an unsynchronized java collection in a multithreaded environment, and I don't want to force readers of the collection to synchronize[1], is a solution where I synchronize the writers and use the atomicity of reference assignment feasible? Something like:

private Collection global = new HashSet(); // start threading after this

void allUpdatesGoThroughHere(Object exampleOperand) {
  // My hypothesis is that this prevents operations in the block being re-ordered
  synchronized(global) {
    Collection copy = new HashSet(global);
    copy.remove(exampleOperand);
    // Given my hypothesis, we should have a fully constructed object here. So a 
    // reader will either get the old or the new Collection, but never an 
    // inconsistent one.
    global = copy;    
  }
}

// Do multithreaded reads here. All reads are done through a reference copy like:
// Collection copy = global;
// for (Object elm: copy) {...
// so the global reference being updated half way through should have no impact 

在这种情况下,滚动您自己的解决方案似乎经常失败,因此我有兴趣了解其他模式、集合或库,我可以使用这些模式、集合或库来防止创建对象并阻止我的数据消费者.

Rolling your own solution seems to often fail in these type of situations, so I'd be interested in knowing other patterns, collections or libraries I could use to prevent object creation and blocking for my data consumers.

[1] 与写入相比,读取花费的时间占很大比例,再加上引入死锁的风险.

[1] The reasons being a large proportion of time spent in reads compared to writes, combined with the risk of introducing deadlocks.

几个答案和评论中有很多很好的信息,一些重点:

A lot of good information in several of the answers and comments, some important points:

我发布的代码中存在错误.在全局(命名错误的变量)上同步可能无法在交换后保护同步块.您可以通过对类进行同步来解决此问题(将同步关键字移动到方法中),但可能存在其他错误.更安全、更易于维护的解决方案是使用 java.util.concurrent 中的某些内容.我发布的代码中没有最终一致性保证",确保读者看到作者更新的一种方法是使用 volatile 关键字.经过反思,引发这个问题的一般问题是尝试在 java 中使用锁定写入来实现无锁读取,但是我(已解决)的问题是与集合有关,这可能会给未来的读者带来不必要的困惑.因此,如果我发布的代码不明显,则通过一次允许一个作者对不受多个阅读器线程保护的某个对象"执行编辑来工作.编辑的提交是通过原子操作完成的,因此读者只能获得编辑前或编辑后的对象".当/如果读取器线程获得更新时,它不能发生在读取中间,因为读取发生在对象"的旧副本上.一个简单的解决方案,在 Java 中提供更好的并发支持之前,它可能已经被发现并被证明以某种方式被破坏.

推荐答案

与其尝试推出自己的解决方案,不如使用 ConcurrentHashMap 作为您的设置,并将所有值设置为某个标准值?(像 Boolean.TRUE 这样的常量会很好用.)

Rather than trying to roll out your own solution, why not use a ConcurrentHashMap as your set and just set all the values to some standard value? (A constant like Boolean.TRUE would work well.)

我认为这种实现适用于多读者少作者的情况.甚至还有 一个构造函数,可让您设置预期的并发级别".

I think this implementation works well with the many-readers-few-writers scenario. There's even a constructor that lets you set the expected "concurrency level".

更新: Veer 建议使用 Collections.newSetFromMap 实用方法将 ConcurrentHashMap 转换为 Set.由于该方法采用 Map<E,Boolean> 我的猜测是它在幕后将所有值设置为 Boolean.TRUE 做同样的事情.

Update: Veer has suggested using the Collections.newSetFromMap utility method to turn the ConcurrentHashMap into a Set. Since the method takes a Map<E,Boolean> my guess is that it does the same thing with setting all the values to Boolean.TRUE behind-the-scenes.

更新:针对发帖者的例子

这可能是我最终会采用的方法,但我仍然对我的极简主义解决方案如何失败感到好奇.——迈尔斯汉普森

That is probably what I will end up going with, but I am still curious about how my minimalist solution could fail. – MilesHampson

您的极简主义解决方案只需稍作调整即可正常工作.我担心的是,虽然现在很少,但将来可能会变得更加复杂.很难记住在制作线程安全的东西时假设的所有条件——尤其是当你在几周/几个月/几年后回到代码中进行看似微不足道的调整时.如果 ConcurrentHashMap 以足够的性能满足您的所有需求,那么为什么不使用它呢?所有令人讨厌的并发细节都被封装掉了,即使是 6 个月后,你也很难把它搞砸!

Your minimalist solution would work just fine with a bit of tweaking. My worry is that, although it's minimal now, it might get more complicated in the future. It's hard to remember all of the conditions you assume when making something thread-safe—especially if you're coming back to the code weeks/months/years later to make a seemingly insignificant tweak. If the ConcurrentHashMap does everything you need with sufficient performance then why not use that instead? All the nasty concurrency details are encapsulated away and even 6-months-from-now you will have a hard time messing it up!

在您当前的解决方案生效之前,您确实需要至少进行一次调整.正如已经指出的,您可能应该将 volatile 修饰符添加到 global 的声明中.我不知道你是否有 C/C++ 背景,但是当我了解到 volatile 在 Java 中 实际上比 在 C 中.如果您打算在 Java 中进行大量并发编程,那么最好熟悉 Java 内存模型.如果您不对 global 的引用进行 volatile 引用,则可能没有线程会看到对 global 的值的任何更改直到他们尝试更新它,此时进入 synchronized 块将刷新本地缓存并获取更新的参考值.

You do need at least one tweak before your current solution will work. As has already been pointed out, you should probably add the volatile modifier to global's declaration. I don't know if you have a C/C++ background, but I was very surprised when I learned that the semantics of volatile in Java are actually much more complicated than in C. If you're planning on doing a lot of concurrent programming in Java then it'd be a good idea to familiarize yourself with the basics of the Java memory model. If you don't make the reference to global a volatile reference then it's possible that no thread will ever see any changes to the value of global until they try to update it, at which point entering the synchronized block will flush the local cache and get the updated reference value.

但是,即使添加了volatile,仍然存在一个巨大的问题.这是一个有两个线程的问题场景:

However, even with the addition of volatile there's still a huge problem. Here's a problem scenario with two threads:

我们从空集或 global={} 开始.线程 AB 在它们的线程本地缓存内存中都有这个值.线程A获取global上的synchronized锁,并通过复制global开始更新并将新密钥添加到集合中.当线程 A 仍在 synchronized 块内时,线程 B 将其本地值 global 读取到堆栈并尝试进入 synchronized 块.由于线程 A 当前位于监视器线程 B 块内.线程A通过设置引用并退出监视器来完成更新,产生global={1}.线程 B 现在可以进入监视器并复制 global={1} 集.线程 A 决定进行另一次更新,读取其本地 global 引用并尝试进入 synchronized 块.由于线程 B 当前持有 {} 上的锁,因此 {1} 上没有锁,线程 A 成功进入监视器!线程 A 还会复制 {1} 以进行更新. We begin with the empty set, or global={}. Threads A and B both have this value in their thread-local cached memory. Thread A obtains obtains the synchronized lock on global and starts the update by making a copy of global and adding the new key to the set. While Thread A is still inside the synchronized block, Thread B reads its local value of global onto the stack and tries to enter the synchronized block. Since Thread A is currently inside the monitor Thread B blocks. Thread A completes the update by setting the reference and exiting the monitor, resulting in global={1}. Thread B is now able to enter the monitor and makes a copy of the global={1} set. Thread A decides to make another update, reads in its local global reference and tries to enter the synchronized block. Since Thread B currently holds the lock on {} there is no lock on {1} and Thread A successfully enters the monitor! Thread A also makes a copy of {1} for purposes of updating.

现在线程 AB 都在 synchronized 块内,并且它们具有相同的 global={1} 副本 设置.这意味着其中一个更新将丢失!这种情况是由于您正在同步一个存储在引用中的对象,而该引用是您在 synchronized 中更新的.代码>块.您应该始终非常小心用于同步的对象.您可以通过添加一个新变量来充当锁来解决此问题:

Now Threads A and B are both inside the synchronized block and they have identical copies of the global={1} set. This means that one of their updates will be lost! This situation is caused by the fact that you're synchronizing on an object stored in a reference that you're updating inside your synchronized block. You should always be very careful which objects you use to synchronize. You can fix this problem by adding a new variable to act as the lock:

private volatile Collection global = new HashSet(); // start threading after this
private final Object globalLock = new Object(); // final reference used for synchronization

void allUpdatesGoThroughHere(Object exampleOperand) {
  // My hypothesis is that this prevents operations in the block being re-ordered
  synchronized(globalLock) {
    Collection copy = new HashSet(global);
    copy.remove(exampleOperand);
    // Given my hypothesis, we should have a fully constructed object here. So a 
    // reader will either get the old or the new Collection, but never an 
    // inconsistent one.
    global = copy;    
  }
}

这个错误非常阴险,以至于其他答案都没有解决它.正是这些疯狂的并发细节导致我建议使用已经调试过的 java.util.concurrent 库中的一些东西,而不是尝试自己将一些东西放在一起.我认为上面的解决方案会起作用——但是再次搞砸它有多容易?这会容易得多:

This bug was insidious enough that none of the other answers have addressed it yet. It's these kinds of crazy concurrency details that cause me to recommend using something from the already-debugged java.util.concurrent library rather than trying to put something together yourself. I think the above solution would work—but how easy would it be to screw it up again? This would be so much easier:

private final Set<Object> global = Collections.newSetFromMap(new ConcurrentHashMap<Object,Boolean>());

由于引用是 final,因此您无需担心线程使用过时的引用,并且由于 ConcurrentHashMap 在内部处理了所有令人讨厌的内存模型问题,因此您不必担心'不必担心监视器和内存屏障的所有令人讨厌的细节!

Since the reference is final you don't need to worry about threads using stale references, and since the ConcurrentHashMap handles all the nasty memory model issues internally you don't have to worry about all the nasty details of monitors and memory barriers!