arrays - Which memory barriers do I need, to make the writes to image in thread A visible in Thread B? - Stack Overflow

Where do I need to put memory barriers? The writes to image in thread A should be visible in thread B?

Where do I need to put memory barriers? The writes to image in thread A should be visible in thread B? The spots are marked in the pseudo code example and are derived from this question/answer.

There is currently a discussion in our team to change the blocked wait into a lock free busy wait. Therefore should be ignored as a memory barrier.

Global:

_Atomic int request = 0;
_Atomic int reply = 0;
char image[640 * 640];

mutex_t mu;
cond_t  cv;

Thread A:

int last_req = 0
int curr_req = request;

if (curr_req != last_req)
{
    last_req = curr_req;
    char* buff = GetBufferFromCamera(.....);
    memcpy(image, buff, sizeof(image));
    atomic_thread_fence(memory_order_release);  // Okay?

    mtx_lock(&mu);
      reply = curr_req;
      cnd_signal(&cv);
    mtx_unlock(&mu);
}


Thread B:

int ticket = atomic_fetch_add(&request, 1);
ticket++;
mtx_lock(&mu);
  while (ticket != reply)
     cond_wait(&cv, &mu);
mtx_unlock(&mu)

atomic_thread_fence(memory_order_acquire); // Okay?

// I want to use image

As a side note: I am on arm8.

Where do I need to put memory barriers? The writes to image in thread A should be visible in thread B? The spots are marked in the pseudo code example and are derived from this question/answer.

There is currently a discussion in our team to change the blocked wait into a lock free busy wait. Therefore should be ignored as a memory barrier.

Global:

_Atomic int request = 0;
_Atomic int reply = 0;
char image[640 * 640];

mutex_t mu;
cond_t  cv;

Thread A:

int last_req = 0
int curr_req = request;

if (curr_req != last_req)
{
    last_req = curr_req;
    char* buff = GetBufferFromCamera(.....);
    memcpy(image, buff, sizeof(image));
    atomic_thread_fence(memory_order_release);  // Okay?

    mtx_lock(&mu);
      reply = curr_req;
      cnd_signal(&cv);
    mtx_unlock(&mu);
}


Thread B:

int ticket = atomic_fetch_add(&request, 1);
ticket++;
mtx_lock(&mu);
  while (ticket != reply)
     cond_wait(&cv, &mu);
mtx_unlock(&mu)

atomic_thread_fence(memory_order_acquire); // Okay?

// I want to use image

As a side note: I am on arm8.

Share Improve this question edited Jan 30 at 14:04 knivil asked Jan 29 at 16:09 knivilknivil 8174 silver badges12 bronze badges 8
  • 2 Just use a release-store and an acquire-load (like atomic_store_explicit(&reply, curr_req, memory_order_release). A data-ready flag is one of the classic use-cases for release / acquire. preshing/20120913/acquire-and-release-semantics. Your current mutex already gives you acquire/release synchronization. – Peter Cordes Commented Jan 29 at 16:43
  • BTW, C++20 has .wait() and .notify_all() member functions if you want to roll your own efficient wait that doesn't spin indefinitely. (e.g. Linux futex, like a condition variable uses). But there's no C23 equivalent of that, unfortunately. And avoiding unnecessary notify calls when there are no waiters is non-trivial, while also never failing to notify when there is a waiter. – Peter Cordes Commented Jan 29 at 16:46
  • 1 request doesn't need to be updated with an atomic RMW, you could use store(&request, 1+ load(&request, relaxed), relaxed) since thread B is the only writer. (Your ticket algo wouldn't work if there were multiple writers; requests could get missed by thread A which only processes the new request number, not all intervening numbers, and your ticket != reply only works if reply matches exactly, not past.) So I don't know why you need the copy to run in another thread; there's no parallelism here. (And spin-wait seems like a bad idea). You can't just call GetBufferFromCamera in thread B? – Peter Cordes Commented Jan 29 at 16:52
  • 1 @knivil: release/acquire synchronization of the load of reply (being implicitly atomic_load_explicit(&reply, memory_order_seq_cst)) creates a happens-before of the stuff in thread A happening-before the stuff in thread B, including all the writes to the non-atomic array happening-before the reads. So yes, it is about synchronization, together with the wait or spin-wait to actually see the value you want to synchronize with. – Peter Cordes Commented Jan 30 at 17:22
  • 1 @Lundin: The example as written has no parallelism, just work in two threads happening serially. Thread B submits a request by incrementing a counter, then spin-waits (if the cond_wait was removed) for reply. B doesn't touch the array while waiting, and doesn't submit another request until it's done processing. (So its store of request should be at least release to make sure its array accesses really are done). And Thread A only works while B waits. reply = curr_req is a release-store (actually seq_cst), while (ticket != reply) is an acquire load (actually seq_cst). – Peter Cordes Commented Jan 30 at 17:31
 |  Show 3 more comments

1 Answer 1

Reset to default 0

STL;DR: probably none.

TL;DR: unlocking a mutex orders all of a thread's previous actions relative to subsequent locking of the same mutex, not just those actions performed while the mutex was locked.

There is currently a discussion in our team to change the blocked wait into a lock free busy wait. Therefore should be ignored as a memory barrier.

All conflicting accesses to shared memory need to be ordered relative to each other. If there are in fact any such accesses by different threads, then that requires some form of mutual exclusion. Fences are not usually sufficient, and separate fences are not usually necessary. Generally, the appropriate memory-ordering can and should be integrated into the mutual-exclusion mechanism.

The code presented in the question seems a bit incomplete. In particular, I don't see the "busy wait" part of your "lock free busy wait". I suppose the idea is that some or all of the Thread A code presented would spin inside a loop, and when it observes request to have been incremented, it fetches a new image and writes that into image.

I guess your concern is about image being updated by thread A without the mutex locked. That concern is probably unwarranted:

  1. The read of _Atomic variable request has sequential consistency memory semantics. Supposing the combination of that with the if condition is effective for providing the needed mutual exclusion for access to image, that also orders thread A's writes to image relative to other threads' previous accesses to image.

  2. Unlocking the mutex has release (or stronger) memory ordering semantics with respect to the mutex, so it orders A's writes to image (among other things) relative to accesses by other threads that subsequently lock the mutex (which operation has acquire semantics or stronger).

  3. That includes Thread B's proposed accesses following the code presented. Note here in particular that cond_wait() releases the mutex before blocking, and re-locks it before returning. Thus, either the initial mutex lock or the final re-lock will get you the memory ordering you need for image.

Thus, for the purposes you're asking about, and as far as I can determine from only the code you have presented, no explicit fences are needed in either thread.

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745291346a4620878.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信