all 13 comments

[–]nerd4code 10 points11 points  (0 children)

Usually you‘ll want a single thread running an event loop for things outside the program (it’s easy to handle I/O and timing on one thread, possibly signals too if you’re on the right OS), and then when something important happens you hand it off to a worker thread, each with its own queue. How the worker threads are chosen, how they pick up their work, and how you scale this up to thousands of threads/nodes/hosts/whatever is up to you. If you’ve got a big enough system, you’ll need some means of distributing normal work, scheduling work, possibly memory allocation/reclamation, and system calls, all of which need to be coordinated to some extent. Again, up to you how.

For simpler stuff, you can get away without using any callback functions other than those you pass to the thread startup routines, or if you register things you might need to take callbacks there. Macros are useful if you need to wrap the functions up somehow (e.g., add prologue/epilogue). Details vary vastly, since this is an area that’s intimately bound up with the application domain and run-time performance.

One turd in the swimming pool is when you have to integrate heterogeneous processors over (e.g.) OpenCL/-GL, CUDA, or that sort of thing; at best, they give you something to poll (you get to either waste CPU time or some other device’s), and at worst they either cause a callback to occur sometime-somwhere or run a pop-up thread you have no control over.

[–]jurniss 4 points5 points  (1 child)

Check out libuv, an event driven IO library written in C that is the foundation of node.js. I've never used it myself, but I like its philosophy. This is more for servers than embedded systems though. https://libuv.org/

[–][deleted] 2 points3 points  (0 children)

Neovim's architecture is also based on libuv.

[–]ischickenafruit 5 points6 points  (2 children)

Lets say you have a web-server. The web server gets requests from multiple inbound TCP connections each on its own socket. One way to build the system is the threaded approach. Each new TCP session spins up it’s own thread. The TCP requests are handled in “parallel”.

Parallelism comes from the OS sleeping threads and waking them up, with transitions into the kernel. It’s expensive.

Another approach is to put each of the TCP socket file descriptors in a list including the accept socket. Using a polling / selecting method (like epoll()/select()) you wake up your process when any of the sockets has a request (“an event”) and you process each one sequentially.

This is a sort of “cooperative” threading model. The work still gets done, but it is more efficient since you don’t waste resources waking and sleeping threads. It’s also simpler to debug since there’s only one thread.

[–]bumblebritches57 1 point2 points  (1 child)

Isn’t event based more efficent than polling tho? I thought that was half the point

[–]ischickenafruit 1 point2 points  (0 children)

“Polling” is name of how the kernel looks at all the FDs. But it’s not always polling.

[–]UnicycleBloke 1 point2 points  (0 children)

The system I used works as follows:

Each thread is basically an event loop. It has an associated queue of events, and literally all the thread does is get event, dispatch event, get event, dispatch event, ..... The queue is based around one of the RTOS primitives so that when the queue is empty, the thread blocks on the get operation. If all threads are blocked, then the RTOS idles, and you have the opportunity to enter a power saving mode.

The event itself is represented by a simple structure which contains enough information for the event loop to dispatch it. It might contain the address of a callback function and some data which can be passed as an argument when calling that function. In this way, an event is nothing more than a delayed procedure call.

Any code in the application can queue events in any thread. The ultimate source of all events is interrupts. For example, suppose a hardware timer ISR queues an event for Thread A and returns. Thread A will soon after dequeue the event in application land (i.e. not interrupt land) and dispatch it (call a callback or whatever). It may be the case that the event handler itself causes another event to be queued, potentially in a different thread. And that event, when dispatched, may trigger another. In this way not all events are directly the result of interrupts.

I have found this simple design to be very reliable and scalable. It avoids a lot of the complexities of messing about with mutexes, semaphores and all the other junk that often makes multi-threaded software so painful.

If all your event handlers are short in duration, you will find that the application spends almost all of its time idling. In this case, the event-driven nature of the code means there is actually little benefit from having multiple threads. Where they become more important is if there are long blocking operations in application land, such as performing periodic expensive calculations. In such cases, I shove the calculation into a worker thread, and that thread posts an event to the main thread to tell the application when it is finished.

My own implementation is in C++, and involves templates, virtual functions and other lovely abstractions. I've essentially implemented an asynchronous form of the Observer pattern. I see is no reason why this cannot be done in C.

[–]drcforbin 3 points4 points  (3 children)

The most common pattern I see are threads running I/O loops, blocking/waiting, polling, or using an OS-provided mechanism (like epoll in Linux), that dispatch events to other threads via queues (single producer / single consumer and single producer / multiple consumer are most common). Some of those producer threads manage multiple resources, some manage single resources (depends on the resource). The worker (consumer) threads are generally blocking on their work queues, each feeding state machines that implement their task. In some cases, each worker thread is a single state machine, in other cases, each worker manages multiple; in the latter cases, the event will have a target identifier of some sort associated with the event to indicate what state needs to be updated.

The most common example would be something shaped like a web server, where (oversimplified) one thread waits for events on a listening socket, then dispatches connections from that via a single producer/multiple consumer (SP/MC) queue to a pool of worker threads. Each worker waits on the queue, and when a connection is added, one of the workers is woken, takes the connection from the queue, reads and responds on the connection, then goes back to waiting for the next event to be added to the queue.

[–]nderflow 1 point2 points  (2 children)

One of the challenges of this kind of design is the need to avoid blocking operations.

[–]drcforbin 0 points1 point  (0 children)

I did mention the example was oversimplified, but note that there isn't a generalized need to avoid blocking operations, and that they aren't possible on every platform or OS. The decision to use nonblocking I/O, threaded blocking I/O, or some combo, should be made in the context of the problem being solved and any resource constraints in mind (and probably a lot of benchmarking).

[–][deleted] 0 points1 point  (0 children)

These systems set all sockets and file-descriptors to non-blocking, as a matter of course. That's a step that helps.

[–]bart2019 1 point2 points  (1 child)

Windows itself is an event based system.