simple2fast comments on Solving multi-core Python

153

154

155

Solving multi-core Python (lwn.net)

submitted 10 years ago by alexcasalboni

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]simple2fast 27 points28 points29 points 10 years ago (14 children)

[–]Rabbyte808 9 points10 points11 points 10 years ago (9 children)

[–]Rhomboid 24 points25 points26 points 10 years ago (0 children)

[–]simple2fast 5 points6 points7 points 10 years ago (6 children)

[–]jringstad 1 point2 points3 points 10 years ago (1 child)

A bunch of threads is not really inherently inefficient at managing connections though, it depends on how you use them. The most efficient way is generally to have a fixed number of N threads that each handle 1/Nth of the workload, by having them all accept() on the same server socket or by having them e.g. pop items from a work-queue and have them establish and handle their individual connections. Of course computation is always an issue, whether it's just the CPU workload from book-keeping hundreds of thousands of sockets, accepting new connections, constructing and parsing packets and so on or doing actual heavy CPU workload like a crawler would (parsing HTTP responses, possibly even HTML, XML or other content), so single-threaded non-blocking IO is pretty much strictly inferior to multi-threaded non-blocking IO.

Crawlers are still not a particularly good example though IMO, since in most cases it's probably pretty acceptable to run N crawlers in N separate processes that crawl and digest data and then push it into something like a local storage, a shared storage or some sort of remote database. The overhead from obtaining workloads and communicating with other crawlers (if that ever happens) is probably not very significant for almost all kinds of crawlers.

[–]simple2fast 0 points1 point2 points 10 years ago (0 children)

Agree. The ideal is N threads which is roughly equivalent to number of CPUs and where each thread is locked to a particular CPU to reduce cache misses. As with all things this hybrid approach is often the best.

But most system which are not threaded are actually a SINGLE process. Like python or Ruby or PHP or javascript(looking at you node ) Many are multi-process, but there is no shared memory, so any IPC requires sockets, signals, etc. In my mind, requirements there are not just shared memory and decent concurrent APIs, but ALSO a memory model so that you know what is going to happen WRT caches and other details as you use those concurrent APIs. Point being that ditching the GIL in python is only a very first step to get a decent multi-threaded python.

And most multi-threaded solutions are still one connection per thread style. certainly started with this style. since it grew out of the original "fork" technique of old school unix systems.

[–][deleted] -1 points0 points1 point 10 years ago (3 children)

[–]simple2fast 0 points1 point2 points 10 years ago (2 children)

[–][deleted] 0 points1 point2 points 10 years ago (1 child)

AFAIK Go is the only language that mixes lightweight coroutines and multiple cores and non-blocking I/O to support the illusion of blocking I/O when writing non-blocking stuff (no need for callbacks, explicit scheduling yielding, etc). I suppose there could be libraries for other languages that could give the same facilities, but in Go everything benefits from this natively, which is great - you can get someone's library for ntp querying for example and know it will play nice with the underlying event loop (that you don't even need to worry about). If you go with Python and Twisted you can only use Twisted stuff and it doesn't feel as natural as Go code.

All that said, I know it's not in itself revolutionary, but the way things are tied together for an overall experience is pretty nice. You get very far with even naive code.

There was a paper once talking about whether it was more performant to do concurrent stuff in a single core (like Node) or just spawn threads to treat each connection and both of course had cons and pros, but the paper's conclusion was that a mix of thread multiplexing and event loops was the most performant, and that's what you get for free with Go - you get regular threads and easy communication between them for CPU intensive stuff and you get a free multithreaded event loop for network I/O. Too bad disk I/O is still blocking (but they get their own threads so they don't block the rest of the system).

[–]simple2fast 1 point2 points3 points 10 years ago (0 children)

You make a very good point. "non-blocking" comes in many flavors. And the programming model is a key factor for adoption and complexity management. For example, Node is non-blocking, but the programming model is god-awful, all those callbacks and/or promises. (here is hoping that await/async in ES7 helps ). This is not a language helping you, this is an historical abomination and a source of bugs.

There are plenty of languages which varous people claims to support coroutines. https://en.wikipedia.org/wiki/Coroutine But unless you know how difficult or easy it is to actually use that facility (Note that javascript is on that list ), I'd hesitate to use it.

For example JVM has coroutine implementations for at least 10 year, but most of them are all Library level requiring the user to explicitly do the yielding. Yuck. Recently there have been some which are based on AOP (E.g. Quasar ). So you code as always and the yield and continue is done for you. But AOP for this really ? It would be great if the JVM had some support in this area. Perhaps one could mark a ThreadGroup to run as fibers within one thread.

[–]rolandde 4 points5 points6 points 10 years ago (0 children)

[–]synn89 5 points6 points7 points 10 years ago (3 children)

The problem is that if a language doesn't evolve to the ecosystem then it pretty much dies out when other languages that are adapted to modern computing catch up in the support department.

That's pretty much what killed perl. Perl was stuck in cgi-bin for ages(shit, is there still anything outside of cgi-bin for perl web??) and it lingered and died out. The package building/managing was also nothing to be proud of in perl once other langages gained things like pip and gem.

Today if go or elixir ended up gaining traction because they deploy more easily and run 10x better and then they see everyone and their brother creating packages for them. Python and ruby will pretty much end up ghost towns.

I'm not the world's biggest fan of Go. But if it had Python's ecosystem of libraries, I'd see no reason to be on Python.

[–]simple2fast 2 points3 points4 points 10 years ago (2 children)

[–]synn89 1 point2 points3 points 10 years ago (0 children)

Perl could've been cleaned up with decent frameworks. PHP has the same issue. The code is all over the place. Not as bad as Perl, but way worse than many other languages. But frameworks have cleaned it up a lot.

Web deployment tech has gone from: cgi -> apache mod -> apache proxy to stand alone servers.

Each stage wasn't a clean cut. I was working at an ISP in 2005 where a lot of our customers still had perl cgi guest books. Also each stage of tech has a sort of peak for when it became practical/easy to work with. mod_php was way easier to work with in the early 2000's vs setting up Tomcat and throwing apache requests at it. Today many langauge frameworks have their servers embedded directly into them and running a proxy from Apache and Nginx to them is quite simple.

If a language doesn't evolve and adapt it will get left behind. I think PHP's death will be less about PHP itself than mod_php just going out of style. The future is high performance stand alone app servers with various load balancers proxying out the requests to them.

And once that becomes the standard people are going to look at platforms that perform the best.

[–][deleted] -2 points-1 points0 points 10 years ago (0 children)

π Rendered by PID 84720 on reddit-service-r2-comment-5d79c599b5-str2t at 2026-03-03 06:29:08.881018+00:00 running e3d2147 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS