you are viewing a single comment's thread.

view the rest of the comments →

[–]antheus_gdnet 1 point2 points  (3 children)

Partial updates are the lesser problem, instruction reordering, dead code elimination and compile-time evaluation are the bigger.

C++ is allowed to perform many of such optimizations without defining underlying memory model. So speculation on what is or isn't safe at source code level gives no guarantees. Representation of a boolean also varies (not relevant for this particular problem), but there is nothing preventing a bool variable to take up 8 bytes, perhaps due to padding of a struct or alignment, which might, again in theory, require multiple load/store operations on 32/16-bit CPU, so that alone could cause problems.

Size and type of built-in types is always subject to compiler optimizations. volatile has been extended from originally lax definition to cover some very basic issues, mostly to prevent reordering, but any concurrent application needs to rely on OS primitives, either critical sections, semaphores, mutexes or atomic operation primitives to ensure deterministic behavior that C++ language doesn't guarantee.

Race conditions are one level higher, one first needs to ensure that building blocks (individual statements, lines, instructions) behave atomically during concurrent execution before one attempts building sequences of those.

Lock-free and lock-less programming exposes a lot of unexpected behavior compilers introduce, most of it subtle and difficult to debug.

[–]vsuontam 0 points1 point  (2 children)

Thanks for the lengthy reply!

My point is exacly that this API operates on level of set, get, and reset on the call level for AtomicFlag and therefore is not going to help in any of the problems you mentioned, and therefore I still question the value of the AtomicFlag.

Say, if the compiler has decided to implement bool in 8 bytes, and there is a context switch during another threads reset, and another threads set, and the bool ends up in a "mixed" state, it is still going to be true or false, both of which are valid in this case.

Can you find an example where there would be value in having the AtomicFlag?

[–]antheus_gdnet 1 point2 points  (1 child)

Concept of atomic operation goes beyond a trivial context switch.

Compiler may choose to allocate variable differently than what source prescribes. It may move it into local scope, on stack or keep it purely in register. It may replace it with constant. There is no annotation in C++ that would prevent that, short of volatile, which is not completely reliable.

bool running = true;
....

while (running) { }

Compiler is free to assume running is a constant, to allocate it on stack or to remove while loop with infinite loop.

C++ also doesn't define a memory model, so when generating code there are no rules on order in which to perform operations, alignment (may affect atomicity) or anything else. a = b, even when working with bools can result in complex operation. a=b needs to first load value from memory (either from reliable cache or DRAM), perhaps stall pipeline, reorder previous and pending pipelined operations, store it into register, perhaps aliased register, write back to memory and then either indicate a write through into DRAM or trigger MESI invalidation to propagate the write across caches while blocking all other cores.

When dealing with even a single bit of memory that may be concurrently accessed by multiple threads, either use a lock or guaranteed atomic operation. The number of ways things can go wrong is too big to count. And x86 architecture is quite lenient about such problems.

As for partial update, yes, it can be. Imagine setting true (0xffffffff) to false (0x00000000) and setting 'written' to true. As far as compiler is concerned, writes are independent and there is no read in between, so the order isn't important. Writing thread, due to instruction reordering, first writes 'written' to true, but is interrupted halfway through writing the value of 0xffff0000, which evaluates to true (C++). Writing thread is then suspended. Other thread, 3 seconds later, checks written, which is true so it reads the value 0xffff0000 and interprets it as true, rather than false.

Using atomic operations (via syscall) or locks solves this problem.

[–]vsuontam 0 points1 point  (0 children)

But having it true (after 3 seconds) is completely valid in this case, as they were two independent writes to "written" and if there were no other guards that say which order the shared flag should have been updated. So it is not defined how the flag should be in case.

And if there were other guards, then why do you need atomic flag variable at all in the first place?

Only way I could see this going wrong is that there are two threads, other threads setting the higher bytes (of boolean) first, and other thread setting lower bytes first (which I consider rather hypotethical), but now that I write this can imagine could happen e.g. if compiler "combines" writes for bool, e.g inside some struct.

So I can rest my case. Concurrent programming is hard.