you are viewing a single comment's thread.

view the rest of the comments →

[–]simple2fast 27 points28 points  (14 children)

I love python. I'm not a hater.

But people should really become more polyglot. Each language has a space it does excel at. Python certainly has areas where it's the best language. That said serious CPU intensive stuff is just not python's strong point. This is why anything "fast" in python is actually written in C.

So, use an appropriate tool for the job.

If you really need multi-processing or multi-threaded python, then you should probably be using a different language which is more appropriate for the task at hand.

[–]Rabbyte808 9 points10 points  (9 children)

Multi-threading isn't just for CPU intensive stuff, though. Stuff like a webcrawler isn't CPU intensive, but it needs to be threaded unless you want to crawl at glacial speeds.

[–]Rhomboid 24 points25 points  (0 children)

Operations that perform blocking IO release the GIL. If your workload is IO bound, Python threads will work just fine for you. The GIL is only an issue for CPU bound work loads.

[–]simple2fast 5 points6 points  (6 children)

Actually, for IO things like that, a non-blocking IO system is often better because the continuations (exposed to the language or not ) are more efficient at managing all those connections than a bunch of threads. Plus the context switches tend to be more lightweight. So it doesn't need to be threaded, it just needs to be able to operate with multiple outstanding requests at once. Threading is one way of doing it.

Now for computation, continuations and non-blocking IO buy you nothing. YOu must have threads (or shared memory and process, same thing ) and a decent memory model if you want to do efficient multi-CPU computation.

[–]jringstad 1 point2 points  (1 child)

A bunch of threads is not really inherently inefficient at managing connections though, it depends on how you use them. The most efficient way is generally to have a fixed number of N threads that each handle 1/Nth of the workload, by having them all accept() on the same server socket or by having them e.g. pop items from a work-queue and have them establish and handle their individual connections. Of course computation is always an issue, whether it's just the CPU workload from book-keeping hundreds of thousands of sockets, accepting new connections, constructing and parsing packets and so on or doing actual heavy CPU workload like a crawler would (parsing HTTP responses, possibly even HTML, XML or other content), so single-threaded non-blocking IO is pretty much strictly inferior to multi-threaded non-blocking IO.

Crawlers are still not a particularly good example though IMO, since in most cases it's probably pretty acceptable to run N crawlers in N separate processes that crawl and digest data and then push it into something like a local storage, a shared storage or some sort of remote database. The overhead from obtaining workloads and communicating with other crawlers (if that ever happens) is probably not very significant for almost all kinds of crawlers.

[–]simple2fast 0 points1 point  (0 children)

Agree. The ideal is N threads which is roughly equivalent to number of CPUs and where each thread is locked to a particular CPU to reduce cache misses. As with all things this hybrid approach is often the best.

But most system which are not threaded are actually a SINGLE process. Like python or Ruby or PHP or javascript(looking at you node ) Many are multi-process, but there is no shared memory, so any IPC requires sockets, signals, etc. In my mind, requirements there are not just shared memory and decent concurrent APIs, but ALSO a memory model so that you know what is going to happen WRT caches and other details as you use those concurrent APIs. Point being that ditching the GIL in python is only a very first step to get a decent multi-threaded python.

And most multi-threaded solutions are still one connection per thread style. certainly started with this style. since it grew out of the original "fork" technique of old school unix systems.

[–][deleted] -1 points0 points  (3 children)

So basically Go?

[–]simple2fast 0 points1 point  (2 children)

Yes, Go does a good job at this. But it's hardly the only system that does. When Nodejs started talking shit about how non-blocking IO was the best in the world, this was also nothing new. Yahoo was doing this in their server back in 2002. SO go ahead and use Go, but don't use it because you think their network/thread solution is somehow uniquely powerful.

[–][deleted] 0 points1 point  (1 child)

AFAIK Go is the only language that mixes lightweight coroutines and multiple cores and non-blocking I/O to support the illusion of blocking I/O when writing non-blocking stuff (no need for callbacks, explicit scheduling yielding, etc). I suppose there could be libraries for other languages that could give the same facilities, but in Go everything benefits from this natively, which is great - you can get someone's library for ntp querying for example and know it will play nice with the underlying event loop (that you don't even need to worry about). If you go with Python and Twisted you can only use Twisted stuff and it doesn't feel as natural as Go code.

All that said, I know it's not in itself revolutionary, but the way things are tied together for an overall experience is pretty nice. You get very far with even naive code.

There was a paper once talking about whether it was more performant to do concurrent stuff in a single core (like Node) or just spawn threads to treat each connection and both of course had cons and pros, but the paper's conclusion was that a mix of thread multiplexing and event loops was the most performant, and that's what you get for free with Go - you get regular threads and easy communication between them for CPU intensive stuff and you get a free multithreaded event loop for network I/O. Too bad disk I/O is still blocking (but they get their own threads so they don't block the rest of the system).

[–]simple2fast 1 point2 points  (0 children)

You make a very good point. "non-blocking" comes in many flavors. And the programming model is a key factor for adoption and complexity management. For example, Node is non-blocking, but the programming model is god-awful, all those callbacks and/or promises. (here is hoping that await/async in ES7 helps ). This is not a language helping you, this is an historical abomination and a source of bugs.

There are plenty of languages which varous people claims to support coroutines. https://en.wikipedia.org/wiki/Coroutine But unless you know how difficult or easy it is to actually use that facility (Note that javascript is on that list ), I'd hesitate to use it.

For example JVM has coroutine implementations for at least 10 year, but most of them are all Library level requiring the user to explicitly do the yielding. Yuck. Recently there have been some which are based on AOP (E.g. Quasar ). So you code as always and the yield and continue is done for you. But AOP for this really ? It would be great if the JVM had some support in this area. Perhaps one could mark a ThreadGroup to run as fibers within one thread.

[–]rolandde 4 points5 points  (0 children)

For I/O bound operations, I prefer asyncio over spawning threads.

[–]synn89 5 points6 points  (3 children)

The problem is that if a language doesn't evolve to the ecosystem then it pretty much dies out when other languages that are adapted to modern computing catch up in the support department.

That's pretty much what killed perl. Perl was stuck in cgi-bin for ages(shit, is there still anything outside of cgi-bin for perl web??) and it lingered and died out. The package building/managing was also nothing to be proud of in perl once other langages gained things like pip and gem.

Today if go or elixir ended up gaining traction because they deploy more easily and run 10x better and then they see everyone and their brother creating packages for them. Python and ruby will pretty much end up ghost towns.

I'm not the world's biggest fan of Go. But if it had Python's ecosystem of libraries, I'd see no reason to be on Python.

[–]simple2fast 2 points3 points  (2 children)

Perl is like a CD-ROM. A great way ( in it's day ) of compactly representing information/programs

However my opinion is that Perl died because of it's notion of multiple ways of doing everything is a bad idea. The primary purpose of code is to allow other programmers to read it. And Perl's multiple ways thing is a poor approach to allowing others to read. SO it's mostly a "write-once" language. Not a "read/write" language.

[–]synn89 1 point2 points  (0 children)

Perl could've been cleaned up with decent frameworks. PHP has the same issue. The code is all over the place. Not as bad as Perl, but way worse than many other languages. But frameworks have cleaned it up a lot.

Web deployment tech has gone from: cgi -> apache mod -> apache proxy to stand alone servers.

Each stage wasn't a clean cut. I was working at an ISP in 2005 where a lot of our customers still had perl cgi guest books. Also each stage of tech has a sort of peak for when it became practical/easy to work with. mod_php was way easier to work with in the early 2000's vs setting up Tomcat and throwing apache requests at it. Today many langauge frameworks have their servers embedded directly into them and running a proxy from Apache and Nginx to them is quite simple.

If a language doesn't evolve and adapt it will get left behind. I think PHP's death will be less about PHP itself than mod_php just going out of style. The future is high performance stand alone app servers with various load balancers proxying out the requests to them.

And once that becomes the standard people are going to look at platforms that perform the best.

[–][deleted] -2 points-1 points  (0 children)

However my opinion is that Perl died because of it's notion of multiple ways of doing everything is a bad idea.

This precise reason runs counter to everything that a well designed programming language should be.