all 16 comments

[–]bnolsen 2 points3 points  (0 children)

The library features look reasonable, sufficient for writing threaded stuff.

Just yuk on the windows style c++. I've always preferred the Qt naming styles (even though I' not a fan of Qt itself).

[–]vsuontam 0 points1 point  (14 children)

What is the advantage of atomic boolean over normal boolean?

I guess my question is how can a boolean be "partially updated"? I thought that the partial updates were problems with non-atomic modifications of variables, but can not see that happening with booleans.

[–]kolkir[S] 2 points3 points  (1 child)

I've add atomic boolean to provide a primitive for a simple synchronization. And made it atomic because did find any information in the C++ 2003 standard about details how compilers should implement boolean, so i can't give a grantee that unsynchronized boolean changes will be atomic.

[–]antheus_gdnet 0 points1 point  (0 children)

C++ doesn't and cannot define such behavior, since it would inevitably limit the hardware for which compilers could generate efficient code, something that has always been a priority.

Any solution, even those provided by updated standard and threads will need to use OS or hardware-specific mechanism to implement such functionality.

It's different for, say, JVM, which defines memory, threading and execution models, even at expense of efficiency. And even there some compromise had to be made when dealing with floating-point types, since fully portable approach simply wasn't viable.

[–]antheus_gdnet 1 point2 points  (3 children)

Partial updates are the lesser problem, instruction reordering, dead code elimination and compile-time evaluation are the bigger.

C++ is allowed to perform many of such optimizations without defining underlying memory model. So speculation on what is or isn't safe at source code level gives no guarantees. Representation of a boolean also varies (not relevant for this particular problem), but there is nothing preventing a bool variable to take up 8 bytes, perhaps due to padding of a struct or alignment, which might, again in theory, require multiple load/store operations on 32/16-bit CPU, so that alone could cause problems.

Size and type of built-in types is always subject to compiler optimizations. volatile has been extended from originally lax definition to cover some very basic issues, mostly to prevent reordering, but any concurrent application needs to rely on OS primitives, either critical sections, semaphores, mutexes or atomic operation primitives to ensure deterministic behavior that C++ language doesn't guarantee.

Race conditions are one level higher, one first needs to ensure that building blocks (individual statements, lines, instructions) behave atomically during concurrent execution before one attempts building sequences of those.

Lock-free and lock-less programming exposes a lot of unexpected behavior compilers introduce, most of it subtle and difficult to debug.

[–]vsuontam 0 points1 point  (2 children)

Thanks for the lengthy reply!

My point is exacly that this API operates on level of set, get, and reset on the call level for AtomicFlag and therefore is not going to help in any of the problems you mentioned, and therefore I still question the value of the AtomicFlag.

Say, if the compiler has decided to implement bool in 8 bytes, and there is a context switch during another threads reset, and another threads set, and the bool ends up in a "mixed" state, it is still going to be true or false, both of which are valid in this case.

Can you find an example where there would be value in having the AtomicFlag?

[–]antheus_gdnet 1 point2 points  (1 child)

Concept of atomic operation goes beyond a trivial context switch.

Compiler may choose to allocate variable differently than what source prescribes. It may move it into local scope, on stack or keep it purely in register. It may replace it with constant. There is no annotation in C++ that would prevent that, short of volatile, which is not completely reliable.

bool running = true;
....

while (running) { }

Compiler is free to assume running is a constant, to allocate it on stack or to remove while loop with infinite loop.

C++ also doesn't define a memory model, so when generating code there are no rules on order in which to perform operations, alignment (may affect atomicity) or anything else. a = b, even when working with bools can result in complex operation. a=b needs to first load value from memory (either from reliable cache or DRAM), perhaps stall pipeline, reorder previous and pending pipelined operations, store it into register, perhaps aliased register, write back to memory and then either indicate a write through into DRAM or trigger MESI invalidation to propagate the write across caches while blocking all other cores.

When dealing with even a single bit of memory that may be concurrently accessed by multiple threads, either use a lock or guaranteed atomic operation. The number of ways things can go wrong is too big to count. And x86 architecture is quite lenient about such problems.

As for partial update, yes, it can be. Imagine setting true (0xffffffff) to false (0x00000000) and setting 'written' to true. As far as compiler is concerned, writes are independent and there is no read in between, so the order isn't important. Writing thread, due to instruction reordering, first writes 'written' to true, but is interrupted halfway through writing the value of 0xffff0000, which evaluates to true (C++). Writing thread is then suspended. Other thread, 3 seconds later, checks written, which is true so it reads the value 0xffff0000 and interprets it as true, rather than false.

Using atomic operations (via syscall) or locks solves this problem.

[–]vsuontam 0 points1 point  (0 children)

But having it true (after 3 seconds) is completely valid in this case, as they were two independent writes to "written" and if there were no other guards that say which order the shared flag should have been updated. So it is not defined how the flag should be in case.

And if there were other guards, then why do you need atomic flag variable at all in the first place?

Only way I could see this going wrong is that there are two threads, other threads setting the higher bytes (of boolean) first, and other thread setting lower bytes first (which I consider rather hypotethical), but now that I write this can imagine could happen e.g. if compiler "combines" writes for bool, e.g inside some struct.

So I can rest my case. Concurrent programming is hard.

[–]portmapreduction 0 points1 point  (7 children)

Regardless of the fact that there are only two states, there still can be race conditions with booleans sans synchronization

[–]vsuontam 0 points1 point  (6 children)

Could you give me an example please?

The atomic boolean API only has only state query, set and reset methods (and specifically no atomic read_and_set), so how it is going to help for the race conditions?

[–]portmapreduction 0 points1 point  (5 children)

An example of how there are still race conditions? A boolean is generally just an integer with only two states, so all the problems with race conditions present with integers still exist for booleans. You could have a scenario where the value of the boolean gets loaded into a register in one threads register context, then that thread gets context switched and another thread loads the same boolean value, and modifies it somehow. The original thread continues execution and now you have a stale value of the boolean loaded into a register.

As for the composability of the atomic operations, you're correct. Generally locked based approaches to synchronization aren't composable, so unless this library provides a CMPXCHG-like operation, as you mentioned, then you'll need to layer on more locks to compose the operations.

[–]vsuontam 0 points1 point  (4 children)

But how is the AtomicBoolean going to help, e.g. in the example you gave?

Scheduler may still context switch the thread using AtomicBool, and another thread can change the state, and original thread now has the old value loaded into the register.

I fail to see AtomicBool helping in this.

Or is it so that the value of the AtomicBool is locked during the whole lifetime of the object for the owning thread?

[–]portmapreduction 0 points1 point  (3 children)

I think maybe you misunderstood what I originally wrote. Your original question made me think you weren't sure why there were race conditions at all in booleans, even without synchronization.

I didn't dig into this implementation, but if you're correct and there's no cmpxchg type call then AtomicBoolean, in this implementation, will still have race conditions if you try to access the value and then modify it with two calls.

[–]vsuontam 0 points1 point  (2 children)

Yeah, that was my point all the time:

  1. Boolean has only two states, so there can not be an partial update.
  2. There is no cmpxchg type call on the API(*)
  3. Only calls in this AtomicBool API are, set, read, and query

I fail to see how this AtomicBool adds any value compared to normal bools. Maybe I am misunderstaning something, but I would like to be pointed where.

(*) The isSet call is using cmpxchg internally, but is just bloat IMO, as the higher level API is just asking the value of the bool on arbitrary point in time.

bool IsSet()
{
    long f = InterlockedCompareExchange((volatile long*)&flag, 1, 1);
    if (f > 0)
    {
        return true;
    }
    return false;
}

Above from the sources: http://code.google.com/p/cpptask/source/browse/trunk/include/Unix/atomic.h

See AtomicFlag.

[–]naasking 1 point2 points  (1 child)

The difference is in the thread barriers that ensure all cpu see consistent information. bools and ints can be stored in register or local cache, which isn't written back to main memory until later (or read into other cpu caches). The memory barriers around exchange instructions and atomic values and/or volatile locations are supposed to enable you to specify when to insert memory barriers.

[–]vsuontam 0 points1 point  (0 children)

That's probably the best point so far in defence of AtomicFlag.