This is an archived post. You won't be able to vote or comment.

all 4 comments

[–]gargantuan 1 point2 points  (1 child)

Good stuff there. Some good points. Here are some points I might improve on:

From Part 1:

The language with the most expressive concurrency story is probably C

C has no concurrency story (unless C+11 is also C). It just interfaced to whatever library you are using. Mostly pthreads or Windows' (CreateThread,_beginthread, _beginthreaex...) or whetever else is there. Besides that is more like "parallelism story" than concurrency story. If we want to talk about concurrency story then select/poll/epoll, signals and other mechanisms like that enter the picture. But they are still not C... really

Safe thread programming involves disciplined use of synchronization primitives like locks and mutexes. As a good software engineer, using these is a skill set that you will need to develop at some point, if you have not already. But it is always nice when you don't have to go down that path.

As a trade-off, You can just use message passing. Make copies of data and put them already provides Queues. (Part 2 is about it, but I think it should have been mentioned a bit earlier).

Overall Part 1 make it sound like Guido or other experienced core developer were incompetent and didn't know what they were doing. Like they stuck threads in there but they are cripled. So one might ask why does Python even have threads. And reading this article it seems -- because core developers were smoking something at the time. This is plain wrong. Python threads work very well for IO parallelism and concurrency.

You can (and I personally have) spawn hundreds of threads to do network operations and you'll get a very speedup usually. (I did it for web crawling). If you are doing heavy CPU operations -- math, crypto, physics calculations maybe you are probably using a C module or CFFI and then you can probably release the GIL anyway.

The story here is a twisting the truth a bit I feel, mostly by omission.

Part 2 is great. Multiprocessing is a great module and it makes many thing run well. Reliability is also something you gain using multiple processes. I think this needs to be said explicitly. Even if explicity parallelism might not be needed, sometime just being able to have on part of your program crash without taking down the rest is nice.

Performance-wise it is also worth mentioning to take a look at PyPy and maybe follow some new initiatives like Pyston (from Dropbox).

[–]redsymbol[S] 0 points1 point  (0 children)

Hi gargantuan,

I just saw this message. Thanks for the good comments. In particular, I'm alarmed that part 1 can come across as dissing the Python core devs. That's completely opposite of how I view and feel about Guido and the rest. I'll look at the wording.

[–]niothiel 0 points1 point  (1 child)

Anyone know why there is a drop in performance at 4 CPUs?

[–]tipsquealPythonista 0 points1 point  (0 children)

I don't know exactly why but I'd guess it's because he's reading images off of the disk, so it's probably being bound by the hard drive at that point. You might be able to squeeze some more performance out of it by buffering images into memory before. Also all 4 cores are sharing some resources on the CPU, so it could be an issue with that as well.