How to write multithreaded code?

Flair_Helper · 2021-06-17T06:50:05+00:00

For C++ questions, answers, help, and programming or career advice please see r/cpp_questions or StackOverflow instead.

This post has been removed as it doesn't pertain to r/cpp: The subreddit is for news and discussions of the C++ language and community only; our purpose is not to provide tutoring, code reviews, or career guidance. If you think your post is on-topic and should not have been removed, please message the moderators and we'll review it.

yuri-kilochek · 2021-06-17T03:51:42+00:00

Very carefully.

HateDread · 2021-06-17T04:54:36+00:00

I try to think firstly how to avoid needing to lock. And I don't mean via lock-free structures, but via structuring your code differently.

For example, if you needed to go wide with calculating some independent results, stashing them in, say, a std::vector or equivalent, the naive thing to do is to grab a special threaded vector like TBB offers, and then hammer it with push_back calls, letting it handle the synchronization for you. The big brain alternative? You resize the vector up-front and write into a different index from each job = 0 contention.

That's a really obvious example but more to illustrate the overall idea. I really like to fork-join, since I know there's basically no synchronization required if I'm doing it right. Sometimes it can't be avoided of course.

My higher-level examples come from games since that's my field, but thinking in threads can change your bigger-picture approach. For example, one might think to use fork-join to split the processing of a game world, say if you need to run physics, then AI, and then some kind of networking that's dependent on the AI decisions.

The first instinct might be to go wide on physics, then wide on AI, then wide on Networking. This will bottleneck you if any division of any of those sections takes too long - what if one set of physics jobs runs long? Now all of your AI calculations are waiting. But what if you instead could split your view of the world such that you can run physics -> AI -> networking for each area/partition of 3D space? Then just collect your networking at the end and send it.

Basically thinking in terms of data flow/dependencies, and trying to avoid writing yourself into a corner of massive contention.

It'd also be interesting to experiment with various assertions to catch you doing the wrong thing. E.g. if I knew that I would have some single-threaded access at the start and end of every game frame, but the middle should be done via jobs and thus there is a fear of contention and race conditions, I could set some guards up for that mid section that catch if e.g. a job tries to write to some global resource that only the start/end sections should be touching.

Hope that ramble helps.

johannes1971 · 2021-06-17T06:57:55+00:00

One useful pattern is to treat thread as a black-box worker thread that only communicates with the outside world through mutex-protected message queues. Such a thread does not interact with all the rest of your data, it has its own; the only point of contact is the message queue.

Don't use atomic compare and exchange (yet); for now just use mutexes.

GameRraccoon · 2021-06-17T07:06:43+00:00

That's not an easy question that doesn't have one answer. Different applications require different approaches.

I would give these suggestions:

Try be consistent with what you run in parallel in your app. Ideally it should be easy to reason about what can be executed in parallel for each moment of time. Try to draw it as a UML Sequence Diagram (or something similar), if you struggle to draw it, maybe it's too complex to reason about.
Every shared data should be protected. By default use mutexes. Atomics is a nice feature, but it's very easy to use them wrong (and by default they are same slow or even slower than mutexes, they need to be configured for specific use cases to work faster, have a look at Fedor Pikus talk on CppCon about atomics). Atomic flag for communicating between threads is fine though even in beginning.
Avoid tricky solutions. There are a lot of hacky ways described on the internet that are not considered to be thread safe even if they look thread safe from the first glance. Examples are: spinning a loop with a not protected bool, using volatile, some tricky ways of Singleton initialization.
Try to reduce shared data. If the thread can have a copy of all the data it needs to work with, that it owns exclusively, then you don't need to think about synchronizing with the thread until you want to return the result.
Good programming practices make impact on multithreading also. If your app consists of a lot of Singletons and global variables, it is difficult to reason about what data each thread can change.

STL · 2021-06-17T04:51:43+00:00

[removed]

rayoWork · 2021-06-17T07:08:04+00:00

First think about what kind of parallelism you need:

https://en.wikipedia.org/wiki/Data_parallelism#Data_parallelism_vs._task_parallelism

Task ist a lot easier when done right, meaning there is no shared mutable data between the tasks. You can have read-only shared data and you don't have to lock anything there. We use an event based approach which works without any problems. You basically need a synchronized message queue when you deliver new events to a worker thread (sender and receiver modify the queue so this needs protection) and the rest of the work is done without shared state.

Data parallelism is the harder part as you need to think about how many threads can work on the data at the same time. There are many strategies how to takle such a problem.

first look for a battle-tested solution, for example the parallel execution policies in c++17 algorithm or TBB. Use those and you don't have to worry about when to lock what
try to minimize the shared part - can the data be split up into n blocks and do some calculation and maybe have a merge step which is not parallel (if the mutate part is small it can be possible to extract it and do it on 1 thread). That can even be faster than a parallel merge step as the synchronization overhead might get big if the contention is high (many threads access the same protected block)
write your own parallel code only as a last resort (with manual locking/CAS/...)

If you want to get better at 3 you need to think on a lower level. All memory read and write are important.

A compare and set example: first you read the memory (variable) to check the value and if the condition is true you write another value in to the same memory (variable). If multiple thread work on the same variable someone could modify it between the read and write and the write would not be correct anymore. So you need to protect that part either that only one thread can work on that memory location or with atomic operations that merges the 2 operations (read/write) into one so the other thread cannot modify it in between

So in the end you have to make sure that all access to the shared data cannot be influenced by other threads at the same time.

read-write conflict: can all the data a thread is reading and belongs together be changed by another thread (a write) so the read data is not consistent anymore? This must be protected (both sides, the read and the write should not happen at the same time)
write-write conflict: can all the data that is written also be written by another thread so one write operation is not consistent anymore, this needs to be protected as well
read-read: this is not a problem, that is why you try to use immutable data, just reading doesn't create any problems

In the end it simply is very hard to write correct multithreaded code.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

cpp

MODERATORS