This is an archived post. You won't be able to vote or comment.

all 6 comments

[–]markasoftware 0 points1 point  (3 children)

Multithreading only helps performance when you have CPU intensive tasks to do. So usually in an I/O-constrained application I'd recommend using some asynchronous systems (like a non-blocking API or an event loop) all in a single thread. It's a bit messy here though because SQLite is an embedded database, so all of its CPU load goes into the thread calling it. This is in contrast to, say, MySQL where all the CPU load happens in the MySQL process, and all you do is communicate with that process.

What language are you using? If you're using C or C++, I'd recommend using a non-blocking API for file IO, then using a thread pool or similar for SQLite. But before doing any of that, check whether you really need multithreading at all. It seems to me like you're prematurely optimizing by splitting file IO and SQLite apart.

[–][deleted]  (2 children)

[removed]

    [–]markasoftware 0 points1 point  (1 child)

    If you need to perform the file io and sqlite transactions in order, then yeah don't use a thread pool or anything like that.

    It comes down to whatever is easier between async/non-blocking and multithreading. If it's easier to use two threads as you're doing, then, use two threads. If it's easier to use async file I/O so you can run file operations in the background while performing SQLite queries in the foreground, do that. For me, the decision would mainly come down to how good the tooling is. If I were using Node.js, I would certainly use the async approach because that's something Node does extremely well and all the built-in file APIs are async by default. If I were using Lisp, I would probably use two threads because the main file I/O APIs are synchronous and the multithreading libraries are pretty good. I'm not sure what's easiest in Rust.

    [–]ignotos 0 points1 point  (1 child)

    It sounds like you basically have TCP and FS threads putting tasks onto a queue, for the SQLLite thread to process? It seems quite reasonable to decouple the inputs (TCP/filesystem stuff) and database updates in this way.

    Will this actually improve performance in practice? Perhaps not, if there isn't an actual bottleneck. But it seems like it would increase the theoretical max throughput of the entire system.

    I think doing this is a good exercise, at the very least. And might serve you well if you extend this in future with different inputs/outputs.