Where do I need to put memory barriers? The writes to image in thread A should be visible in thread B? The spots are marked in the pseudo code example and are derived from this question/answer.
There is currently a discussion in our team to change the blocked wait into a lock free busy wait. Therefore should be ignored as a memory barrier.
Global:
_Atomic int request = 0;
_Atomic int reply = 0;
char image[640 * 640];
mutex_t mu;
cond_t cv;
Thread A:
int last_req = 0
int curr_req = request;
if (curr_req != last_req)
{
last_req = curr_req;
char* buff = GetBufferFromCamera(.....);
memcpy(image, buff, sizeof(image));
atomic_thread_fence(memory_order_release); // Okay?
mtx_lock(&mu);
reply = curr_req;
cnd_signal(&cv);
mtx_unlock(&mu);
}
Thread B:
int ticket = atomic_fetch_add(&request, 1);
ticket++;
mtx_lock(&mu);
while (ticket != reply)
cond_wait(&cv, &mu);
mtx_unlock(&mu)
atomic_thread_fence(memory_order_acquire); // Okay?
// I want to use image
As a side note: I am on arm8.
Where do I need to put memory barriers? The writes to image in thread A should be visible in thread B? The spots are marked in the pseudo code example and are derived from this question/answer.
There is currently a discussion in our team to change the blocked wait into a lock free busy wait. Therefore should be ignored as a memory barrier.
Global:
_Atomic int request = 0;
_Atomic int reply = 0;
char image[640 * 640];
mutex_t mu;
cond_t cv;
Thread A:
int last_req = 0
int curr_req = request;
if (curr_req != last_req)
{
last_req = curr_req;
char* buff = GetBufferFromCamera(.....);
memcpy(image, buff, sizeof(image));
atomic_thread_fence(memory_order_release); // Okay?
mtx_lock(&mu);
reply = curr_req;
cnd_signal(&cv);
mtx_unlock(&mu);
}
Thread B:
int ticket = atomic_fetch_add(&request, 1);
ticket++;
mtx_lock(&mu);
while (ticket != reply)
cond_wait(&cv, &mu);
mtx_unlock(&mu)
atomic_thread_fence(memory_order_acquire); // Okay?
// I want to use image
As a side note: I am on arm8.
Share Improve this question edited Jan 30 at 14:04 knivil asked Jan 29 at 16:09 knivilknivil 8174 silver badges12 bronze badges 8 | Show 3 more comments1 Answer
Reset to default 0STL;DR: probably none.
TL;DR: unlocking a mutex orders all of a thread's previous actions relative to subsequent locking of the same mutex, not just those actions performed while the mutex was locked.
There is currently a discussion in our team to change the blocked wait into a lock free busy wait. Therefore should be ignored as a memory barrier.
All conflicting accesses to shared memory need to be ordered relative to each other. If there are in fact any such accesses by different threads, then that requires some form of mutual exclusion. Fences are not usually sufficient, and separate fences are not usually necessary. Generally, the appropriate memory-ordering can and should be integrated into the mutual-exclusion mechanism.
The code presented in the question seems a bit incomplete. In particular, I don't see the "busy wait" part of your "lock free busy wait". I suppose the idea is that some or all of the Thread A code presented would spin inside a loop, and when it observes request
to have been incremented, it fetches a new image and writes that into image
.
I guess your concern is about image
being updated by thread A without the mutex locked. That concern is probably unwarranted:
The read of
_Atomic
variablerequest
has sequential consistency memory semantics. Supposing the combination of that with theif
condition is effective for providing the needed mutual exclusion for access toimage
, that also orders thread A's writes toimage
relative to other threads' previous accesses toimage
.Unlocking the mutex has release (or stronger) memory ordering semantics with respect to the mutex, so it orders A's writes to
image
(among other things) relative to accesses by other threads that subsequently lock the mutex (which operation has acquire semantics or stronger).That includes Thread B's proposed accesses following the code presented. Note here in particular that
cond_wait()
releases the mutex before blocking, and re-locks it before returning. Thus, either the initial mutex lock or the final re-lock will get you the memory ordering you need forimage
.
Thus, for the purposes you're asking about, and as far as I can determine from only the code you have presented, no explicit fences are needed in either thread.
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745291346a4620878.html
atomic_store_explicit(&reply, curr_req, memory_order_release)
. A data-ready flag is one of the classic use-cases for release / acquire. preshing/20120913/acquire-and-release-semantics. Your current mutex already gives you acquire/release synchronization. – Peter Cordes Commented Jan 29 at 16:43.wait()
and.notify_all()
member functions if you want to roll your own efficient wait that doesn't spin indefinitely. (e.g. Linuxfutex
, like a condition variable uses). But there's no C23 equivalent of that, unfortunately. And avoiding unnecessarynotify
calls when there are no waiters is non-trivial, while also never failing to notify when there is a waiter. – Peter Cordes Commented Jan 29 at 16:46request
doesn't need to be updated with an atomic RMW, you could usestore(&request, 1+ load(&request, relaxed), relaxed)
since thread B is the only writer. (Your ticket algo wouldn't work if there were multiple writers; requests could get missed by thread A which only processes the new request number, not all intervening numbers, and yourticket != reply
only works ifreply
matches exactly, not past.) So I don't know why you need the copy to run in another thread; there's no parallelism here. (And spin-wait seems like a bad idea). You can't just call GetBufferFromCamera in thread B? – Peter Cordes Commented Jan 29 at 16:52reply
(being implicitlyatomic_load_explicit(&reply, memory_order_seq_cst)
) creates a happens-before of the stuff in thread A happening-before the stuff in thread B, including all the writes to the non-atomic array happening-before the reads. So yes, it is about synchronization, together with the wait or spin-wait to actually see the value you want to synchronize with. – Peter Cordes Commented Jan 30 at 17:22cond_wait
was removed) forreply
. B doesn't touch the array while waiting, and doesn't submit another request until it's done processing. (So its store ofrequest
should be at leastrelease
to make sure its array accesses really are done). And Thread A only works while B waits.reply = curr_req
is a release-store (actually seq_cst),while (ticket != reply)
is an acquire load (actually seq_cst). – Peter Cordes Commented Jan 30 at 17:31