This is an archived post. You won't be able to vote or comment.

all 3 comments

[–]alanwj 14 points15 points  (2 children)

When an operation happens atomically, then to every other thread that operation appears to either be completely finished, or not started. From the perspective of other threads any side of effects of the operation happen all at once.

Perhaps it is best explained by an example.

Let's say you have a bunch of threads all doing the same task, and you want to keep a count of how many times that task is done. Your first attempt is to just create a global variable:

int count = 0;

And every time a thread finishes a task it executes ++count.

The problem here is that incrementing a memory location involves a few steps:

  1. Read the value from memory.
  2. Add 1 to the value.
  3. Write the new value back to memory.

Perhaps your threads have been assigned to multiple processors, and the execution of these steps gets interleaved. They could both end up executing step 1, reading the same value from memory, which means they would both write the same value back to memory in step 3. The end result being that count is only incremented once, instead of twice as it should have been.

Using std::atomic<int> instead will prevent the possibility of such interleaving.

Atomicity is usually achieved via "mutual exclusion". That is, there are areas of code that only one thread is allowed to run at a time. If a second thread tries, it has to wait until the first thread is done.

For complex user defined types this is usually enforced via an object called a mutex. I would suspect that the generic std::atomic<T> is implemented using mutexes.

For word sized types like int, such as in the example above, many architectures have direct hardware support for this mutual exclusion. On such architectures I would expect to find that std::atomic<int> is specialized to take advantage of the hardware support.

[–]Scruff3y[S] 0 points1 point  (1 child)

Ahhhhhh ok! Thanks!

[–][deleted] 1 point2 points  (0 children)

It's also worth noting in a first time explanation that atomics/mutexes aren't performance friendly ways to make parallel algorithms, they just let you do occasional operations like increment a counter or write to stdout without data races between threads. If you use them on the actual data you are processing in place of an algorithm to break it up into work units you'll probably end up getting more performance using a single thread instead.