同步 push_back 和 std::threadpush_back、std、thread

2023-09-07 16:41:45 作者:早知是梦

我的代码

void build(std::vector<RKD <DivisionSpace> >& roots, ...) {
  try {
    // using a local lock_guard to lock mtx guarantees unlocking on destruction / exception:
    std::lock_guard<std::mutex> lck (mtx);
    roots.push_back(RKD<DivisionSpace>(...));
  }
  catch (const std::bad_alloc&) {
    std::cout << "[exception caught when constructing tree]
";
    return;
  }
}

现在,实际的工作应该是串行的,而不是并行的.

Now, the actual work should be done serially, not in parallel.

RKD 的构造函数可以与RKD 的其他构造函数并行运行.但是,将对象推回 std::Vector 是一个临界区,对吧?

The constructor of RKD can run in parallel with other constructors of RKD. However, pushing the objects back in std::Vector is a critical section, right?

我要构建的对象的数量是已知的.实际上,它将在 [2, 16] 范围内.理论上它可以是任何正数.

The number of the objects I am going to be build is known. It is going to be something in the range [2, 16] in practise. In theory it could be any positive number.

另外,我对它们插入容器的顺序不感兴趣.

Also, I am not interesting for the order they will be inserted in the container.

所以我可以这样做:

RKD tree = RKD(...);
mutex_lock(...);
roots.push_back(tree);

但是这意味着复制,不是吗?

However this would imply copying, wouldn't it?

我应该怎么做才能让我的代码并行?

What should I do to make my code parallel?

因为 this 答案,我决定使用锁(而不仅仅是互斥锁).

I decided to use a lock (instead of just a mutex) because of this answer.

推荐答案

Tomasz Lewowski 在他的评论中提出并且我已经扩展的建议非常简单,并且基于以下观察:A push_back 可能需要重新分配后备存储并复制(或者,最好是移动)元素.这构成了需要同步的关键部分.

The suggestion that Tomasz Lewowski has brought up in his comment and I have expanded upon is pretty simple and based upon the following observation: A push_back on a std::vector potentially needs to re-allocate the backing store and copy (or, preferably, move) the elements. This constitutes a critical section that needs to be synchronized.

对于下一个示例,假设我们想要一个向量填充前 12 个素数,但我们不关心它们的顺序.(我刚刚在这里对数字进行了硬编码,但假设它们是通过一些昂贵的计算获得的,这些计算可以并行进行.)在以下场景中存在危险的竞争条件.

For the next examples, assume we want to have a vector filled with the first 12 primes but we don't care about their ordering. (I have just hard-coded the numbers here but assume they are obtained via some expensive computation that makes sense to do in parallel.) There is a dangerous race condition in the following scenario.

std::vector<int> numbers {};  // an empty vector

// thread A             // thread B             // thread C

numbers.push_back( 2);  numbers.push_back(11);  numbers.push_back(23);
numbers.push_back( 3);  numbers.push_back(13);  numbers.push_back(27);
numbers.push_back( 5);  numbers.push_back(17);  numbers.push_back(29);
numbers.push_back( 7);  numbers.push_back(19);  numbers.push_back(31);

push_back 还有另一个问题.如果两个线程同时调用它,它们都将尝试在同一索引处构造一个对象,这可能会带来灾难性的后果.因此,在分叉线程之前,问题并没有通过 reserve(n) 解决.

There is also another problem with the push_back. If two threads call it simultaneously, they will both attempt to construct an object at the same index with potentially disastrous consequences. So the problem is not solved with a reserve(n) before forking the threads.

但是,由于您事先知道元素的数量,您可以简单地将它们分配到 std::vector 内的特定位置,而无需更改其大小.如果不更改大小,则没有临界区.因此,以下场景中不存在比赛.

However, since you know the number of elements in advance, you can simply assign them to a specific location inside a std::vector without changing its size. If you don't change the size, there is no critical section. Therefore, there is no race in the following scenario.

std::vector<int> numbers(12);  // 12 elements initialized with 0

// thread A          // thread B          // thread C

numbers[ 0] =  2;    numbers[ 1] =  3;    numbers[ 2] =  5;
numbers[ 3] =  7;    numbers[ 4] = 11;    numbers[ 5] = 13;
numbers[ 6] = 17;    numbers[ 7] = 19;    numbers[ 8] = 23;
numbers[ 9] = 29;    numbers[10] = 31;    numbers[11] = 37;

当然,如果两个线程都尝试写入 相同 索引,那么竞争将再次出现.幸运的是,在实践中防止这种情况并不困难.如果你的向量有 n 个元素并且你有 p 个线程,线程 i 只写入元素 [i n/p, (i + 1) n/p).请注意,仅当 j mod p = i 因为它会导致更少的缓存失效.所以上面例子中的访问模式是次优的,最好是这样.

Of course, if both threads attempt to write to the same index, the race will be there again. Fortunately, protecting against this is not difficult in practice. If your vector has n elements and you have p threads, thread i writes only to elements [i n / p, (i + 1) n / p). Note that this is preferable over having thread i write to elements at index j only if j mod p = i because it leads to fewer cache invalidations. So the access pattern in the above example is sub-optimal and had better been like this.

std::vector<int> numbers(12);  // 12 elements initialized with 0

// thread A          // thread B          // thread C

numbers[ 0] =  2;    numbers[ 4] = 11;    numbers[ 8] = 23;
numbers[ 1] =  3;    numbers[ 5] = 13;    numbers[ 9] = 29;
numbers[ 2] =  5;    numbers[ 6] = 17;    numbers[10] = 31;
numbers[ 3] =  7;    numbers[ 7] = 19;    numbers[11] = 37;

到目前为止一切顺利.但是如果你没有 std::vector<int> 而是 std::vector<Foo> 怎么办?如果 Foo 没有默认构造函数,那么

So far so good. But what if you don't have a std::vector<int> but a std::vector<Foo>? If Foo does not have a default constructor, then

std::vector<Foo> numbers(10);

将无效.即使它有一个,创建许多昂贵的默认构造对象只是为了尽快重新分配它们而没有检索到值,这将是令人发指的.

will be invalid. And even if it has one, it would be outrageous to create many expensive default-constructed objects just to re-assign them soon without ever retrieving the value.

当然,大多数设计良好的类都应该有一个非常便宜的默认构造函数.例如,std::string 默认构造为不需要内存分配的空字符串.一个好的实现会将默认构造字符串的成本降低到只是

Of course, most well-designed classes should have a very cheap default constructor. For example, a std::string is default constructed to an empty string that requires no memory allocation. A good implementation will reduce the cost of default-constructing a string to just

std::memset(this, 0, sizeof(std::string));

如果编译器足够聪明,可以确定我们正在分配和初始化整个 std::vector