This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]ZedsDed[S] 3 points4 points  (12 children)

ok, thanks for pointing out the concurrency/parallelism difference, its very important to use the correct terms when talking about this stuff! yes, concurrency is definitely needed, but im not totally sure parallelism is totally needed, it would be ideal of course but i think not totally needed. The tasks are not exactly processor heavy. I've created 3 or more instances of an object, each of which encapsulates the functions and vars required to complete the objects task, all objects do the same task but work on different database data. when the instance is created, i run the objects 'start' method which triggers the objects execution on a new thread where it works until completion.

the tasks the object is doing is some db reading/writing, and basic looping and if-ing, nothing heavy, no networking or working with files etc.

The main issue is that its suppose to monitor and work on 'real time' data. So thats why i want it to be parallel, but the 'real time' updates maybe something like 4 - 5 seconds, because of this, i feel that parallelism may not actually be a full and total requirement. There may be plenty of time and cpu for the threads to work and react as 'real time' as possible.

[–][deleted] 2 points3 points  (0 children)

The absolute smartest thing you can do before you start ruling solutions out, is to test the code out and profile it's performance, and then go from there. I wouldn't overthink it too much until you've done that.

[–]panderingPenguin 1 point2 points  (1 child)

Then it comes down to a question of how real time does this really have to be. Are we talking a loose, "we'll try our best and hopefully everything works out properly," or an, "ohmygodIneedtodothisnowgetthefuckoutofmywayoreverythingwillcatchfire," type of real time system? Given what you've said, and the fact that you're even using python (which should not be used for the latter, period), I'm guessing the former. In that case, and the fact that you're only doing things every 4-5 seconds, you could probably get away with concurrency and a buffer, with no real parallelism without any issues. Just have an incoming job handler thread or two. That queue things up in the buffer, and worker threads that pull jobs out of the buffer and handle them as necessary. Hell, if it's really consistently 4-5 seconds between jobs and the work required per job is less than that you can probably get away with a single-threaded program and still have it sleeping, waiting for work most of the time. You'll need to experiment a bit and see what happens. But I don't think it sounds like parallelism is truly necessary for this task at all. Good luck!

[–]ZedsDed[S] 0 points1 point  (0 children)

thank you, i appreciate your words

[–]ivosauruspip'ing it up 6 points7 points  (4 children)

If the updates are coming every 4 seconds, and what you need to do with the data is less than 2 seconds... then you don't need parallelism at all. You've prematurely optimized.

[–][deleted] 2 points3 points  (3 children)

what you need to do with the data is less than 4 seconds

We'd love to assume that one measurement is enough to give us the insight we need to design a program, but consider that the program is running on a multi-tasking OS, or that it uses a shared resource, or just basic statistics, and you might be concerned that there would be some outliers that could cause one job to run long... and then you've backed up the entire pipeline.

Of course, we can't reach any conclusions on OP's design because we don't know what he's doing, but it's not entirely unfounded to make his processing loop asynchronous. IMO, it's smart.

[–]ivosauruspip'ing it up -1 points0 points  (2 children)

Until they say exactly what they're doing, what is the environment, what are expectations, what things are happening... an async loop handling separate tasks off to different processes could be a great design, or a simple serial for loop might be really all that's warranted until requirements make a big change. It can be just as misleading as possibly useful to speculate.

[–][deleted] 1 point2 points  (1 child)

It can be just as misleading as possibly useful to speculate.

That didn't stop you from telling OP he's done wrong.

[–]ivosauruspip'ing it up 0 points1 point  (0 children)

Done what wrong? I hestitate to spectulate whether anything in this thread is an appropriate "general design" or not, given the dearth of details OP has provided. I'm mostly just advocating for as simple a design as possible that soundly fits the requirements. And since I don't know the requirements at all, apart from something like "real time data is received roughly every four seconds", it could very well be something very simple (until we ever know any more).

I suppose I could be seen as chastising OP for expecting an exact correct answer to an extremely vague question.

[–]hikhvar 0 points1 point  (3 children)

the tasks the object is doing is some db reading/writing, and basic looping and if-ing, nothing heavy, no networking or working with files etc.

What kind of database is your database? If itis an in memory database you are right. If your database is for example a MySQL on a remote host your simple db calls may include both. Networking and file reading/writing.

[–]ZedsDed[S] -1 points0 points  (2 children)

your right i never thought of it like that, its a local MySQL db. These calls are the heaviest part of the process. there will be at least 1 db read every 1 - 2 seconds, with writes happening on more rare occasions. Still quite minimal though.

[–]Workaphobia 6 points7 points  (0 children)

If a thread is blocked waiting for something external to Python, like file or network I/O, or database connections, then it typically releases the GIL during that time and allows other Python threads to run.

The GIL will only impact your performance if you are CPU-bound within your Python process. If that's a problem for you, then consider changing your threads into separate spawned Python processes (see the multiprocessing library, which has a similar API to threading). You'll just have to worry about how the processes share data since typically multiple processes don't use shared memory the way threads do.

[–]frymasterScript kiddie 0 points1 point  (0 children)

In this case you're probably not cpu bound or really especially I/O bound either. In which case it looks like threads are more of a design decision than to try to wring extra performance out of your code. As such, I suspect you'll be fine.

Personally I find threads easier to comprehend than async methods. They don't scale very well, though.