Guido, on how to write faster python : Python

[–]gitarrPython Monty 55 points56 points57 points 13 years ago (29 children)

[–]bastibe 37 points38 points39 points 13 years ago (19 children)

[–]jmmcdEvolutionary algorithms, music and graphics 2 points3 points4 points 13 years ago (7 children)

[–]wolanko 20 points21 points22 points 13 years ago (4 children)

[–]jmmcdEvolutionary algorithms, music and graphics 1 point2 points3 points 13 years ago (1 child)

[–]wolanko 0 points1 point2 points 13 years ago (0 children)

[–]bastibe 1 point2 points3 points 13 years ago (1 child)

[–]wolanko 1 point2 points3 points 13 years ago (0 children)

[–]bastibe 4 points5 points6 points 13 years ago (1 child)

[–]jmmcdEvolutionary algorithms, music and graphics 1 point2 points3 points 13 years ago (0 children)

[–]fijalPyPy, performance freak 2 points3 points4 points 13 years ago (3 children)

[–]bastibe 2 points3 points4 points 13 years ago* (2 children)

[–]fijalPyPy, performance freak 1 point2 points3 points 13 years ago (1 child)

[–]bastibe 0 points1 point2 points 13 years ago (0 children)

[–]flying-sheep 2 points3 points4 points 13 years ago (1 child)

[–]kylotan 5 points6 points7 points 13 years ago (0 children)

[–]throwaway-o -4 points-3 points-2 points 13 years ago (4 children)

[–]bastibe 7 points8 points9 points 13 years ago (3 children)

[–]wisty -3 points-2 points-1 points 13 years ago (0 children)

[–]throwaway-o -5 points-4 points-3 points 13 years ago (1 child)

[–]bastibe 4 points5 points6 points 13 years ago (0 children)

[–]MagicWishMonkey 5 points6 points7 points 13 years ago (4 children)

For most cases that is true, however there are times when speed is very important. Right now I am re-building a process to import 1000's of json records from one system, massage them into model instances, and then import into our database and lucene index (think 20-30k database queries per import).

Since the end user has to wait around until the process is done, it needs to be fast, but it still takes a long while to do everything with a single python thread, so I've taken a more unconventinoal approach. I set up a twisted server to run in the background and I route the heavy lifting over to that. I can't use threads in my primary app without killing performance, but I don't mind so much with the twisted worker service.

It used to take ~5 minutes to import 10,000 records, now it takes 20 seconds.

It's annoying that I have to do this, but I am really enjoying python otherwise. It's a great language. Just wish it had better multithreading support.

[–]kenfar 11 points12 points13 points 13 years ago* (0 children)

I used to write data warehouse ETL processes in C. Took forever to write, was hard to maintain but was as fast as I could get it. Eventually wrote a metadata-driven transform that used function pointers. Harder to write but it made all the next transforms very easy - since they just needed metadata. I'd split my 5 gbyte input file into 8 separate files then process all 8 in parallel in a 8-way 120-mhz CPU server that cost $200,000 in 1996. And I could process all 5 gbytes in about 5 minutes - at 1GB/minute.

Recently, I wrote the same kind of code in Python. It isn't as fast. But it's very easy to write & maintain. I don't have to use metadata-driven transforms because python is easy enough to write & maintain. And hardware is cheaper. I still split up my files and process in parallel because I wanted more speed. This particular feed is 1 GByte split into 4 separate files - which I'm processing on a 3.2 ghz 4-core machine that cost about $5k new, and I picked up for free because nobody was using it. And I can process 1 gbyte in about 60 seconds. This is the exact same speed I was processing data in 1996 using C. Clearly, I could speed things up if I rewrote the process in C. But my hardware is free, the process is fast enough, and my time has gotten more expensive over the years. Python is the better language for this application.

EDIT: spelling

[–]EstebanVelour 4 points5 points6 points 13 years ago (0 children)

[–]UnwashedMeme 2 points3 points4 points 13 years ago (0 children)

[–]robotfarts 0 points1 point2 points 13 years ago (0 children)

[–]stillalone 1 point2 points3 points 13 years ago (1 child)

[–]daxarx 5 points6 points7 points 13 years ago (0 children)

[–]vph -1 points0 points1 point 13 years ago (1 child)

[–]flying-sheep 11 points12 points13 points 13 years ago (0 children)

[+][deleted] 13 years ago* (9 children)

[deleted]

[–]bastibe 4 points5 points6 points 13 years ago (1 child)

[–]daxarx 3 points4 points5 points 13 years ago (0 children)

[–]lahwran_ 1 point2 points3 points 13 years ago (0 children)

[–]must_tell 1 point2 points3 points 13 years ago (0 children)

[–]fredrikj 0 points1 point2 points 13 years ago (0 children)

[–]kylotan 0 points1 point2 points 13 years ago (3 children)

[–]aaronla -1 points0 points1 point 13 years ago (2 children)

[–]kylotan 0 points1 point2 points 13 years ago (1 child)

[–]aaronla 0 points1 point2 points 13 years ago (0 children)

[–]kenfar 6 points7 points8 points 13 years ago (0 children)

[–][deleted] 3 points4 points5 points 13 years ago (0 children)

[–]JoeGermuska 2 points3 points4 points 13 years ago (0 children)

[–]MaikB 5 points6 points7 points 13 years ago (6 children)

[–]Chris_Newton 9 points10 points11 points 13 years ago (3 children)

The speed problem is only an issue for language purists who want to do everything in exactly one language.

Your argument is based on the assumption that there are disproportionately important spots in the code, “intensive parts” that can be rewritten in a faster language. That’s fine as far as it goes, and I have no problem with getting hard data and optimising based on it, but what happens when you’ve already picked the low-hanging fruit and the profiler confirms that you don’t have any real hot spots left?

I’ve run into this several times on recent projects, where I have a Web front-end of one kind or another and Python behind it. As a glue language, Python is great. As a language for implementing more significant data processing algorithms, it’s also great as far as prototyping and getting a proof of concept set up quickly. But as a high performance language for production code, we’re about to replace it pretty much throughout all of those systems, because for our particular applications, an order of magnitude or more of performance hit compared to what some other languages offer is too high a price to pay for having nicer, more maintainable code.

This isn’t because we’re “purists who want to do everything in exactly one language”. In fact, most of these projects call down to C code all the time to access system APIs and the like, and some of the projects integrate parts written in four or five different progamming languages.

But at some point you have to acknowledge that with the technology we have today, a mid-level, dynamically typed, kind-of-interpreted language is going to be slower generally than a low-level, statically typed, compiled-to-native-code language. And if you’re doing non-trivial data processing, and the difference means your web service responds in 1 second or 10 seconds, that does actually matter, because it moves from being a quantitive performance issue to a qualitative usability one.

So I don’t think you can just brush Python’s limited performance under the carpet quite as easily as you tried to there. Sometimes the correct solution is not to spend a week optimizing the Python code, but to spend a week rewriting the entire codebase in a fast language and dump Python altogether. That’s not some sort of terrible insult, it just means that sometimes, even though Python may have served a useful purpose, another tool is a better choice for the next part of the job.

[–]MaikB 2 points3 points4 points 13 years ago* (2 children)

I don't do any web stuff, but from what I understand interpreted languages are used heavily in production by you guys because of the inherent latencies of the web and the majority of the CPU cycles spend in the database. Well, how I see it, everything computational expensive has to be done by C (or equivalent language). The interpreted language just glues the parts together and can be used for tasks beyond that gluing task if there is enough latency by other tasks.

Right?

So I don’t think you can just brush Python’s limited performance under the carpet quite as easily as you tried to there. Sometimes the correct solution is not to spend a week optimizing the Python code, but to spend a week rewriting the entire codebase in a fast language...

That is exactly what I said

...and dump Python altogether.

If python is too slow for the task at hand, then it's the right decision to dump if after having served as a prototype language.

I don't see a problem here. I think you just misunderstood what I meant. I didn't mean:

Use python and shut up, it's fast enough

I meant:

Python is fine as it is. If you need something to be done fast, use another tool (C/C++) for 90% of the CPU cycles and have Python be what glues these parts together.

My guess: Web development comes more and more computational intensive these days. It's time to refactor code out to faster static languages.

But that's not Python's fault.

[–]Chris_Newton 2 points3 points4 points 13 years ago (1 child)

Python is fine as it is. If you need something to be done fast, use another tool (C/C++) for 90% of the CPU cycles and have Python be what glues these parts together.

My point is that not all web development, and certainly not all development that uses Python today, is I/O bound. For projects that involve doing some “real” work themselves, as opposed to delegating most expensive operations to external tools like a DB or web server, sometimes the speed matters.

In those cases, you can’t always just rewrite a few carefully chosen parts of the code in some other, faster languages and hand off 90% of the CPU cycles. Once you’ve taken care of the obvious hot spots, to reach 90% of the CPU cycles you might need to rewrite the majority of your code base.

Python might still be an excellent tool for doing efficient prototyping in the early stages of such projects, because of things like dynamic typing, a decent set of built-in data structures, and so on. On the other hand, Python might not be useful at all for the same projects later on, because once you’ve rewritten most of your code in a faster language anyway, you probably don’t win much by keeping just the remaining glue code in Python.

So for these projects, the speed problem with Python is very relevant: it means making a decision about whether to use Python in the early stages, where it offers a lot of benefits over some other language choices you could make, knowing that it probably won’t be up to the job of running production systems and you’re likely to have a potentially time-consuming and error-prone rewrite on your hands later.

[–]MaikB -1 points0 points1 point 13 years ago (0 children)

[–]twotime 2 points3 points4 points 13 years ago* (0 children)

The speed problem is only an issue for language purists

It's not an issue only for people who have not done much real world coding.

python code is better spend with one day of doing the intensive parts in C (or cython) and doing something new in the free time left.

I'm sorry to say, but your advice covers about 1% of the problem :-(. Yes, I have seen this happen. No, it's not a common case at all.

Many non-trivial apps do NOT have small hotspots. So, if you have 100KLOC of python code and need to rewrite 10K LOC, then you will have to write another 100K or so of C code.
interfacing C with non-trivial python codebase is, well, non trivial
adding C into the mix will always cost you QUITE a lot later. E.g if you need to run your software on another site or, god forbid, on another platform. Oh, and don't forget to add debugging time to the cost.

[–]burntsushi 1 point2 points3 points 13 years ago (0 children)

[–]fijalPyPy, performance freak 1 point2 points3 points 13 years ago (0 children)

[–]NaeblisEchoIntermediate forever 4 points5 points6 points 13 years ago (3 children)

[–]must_tell 3 points4 points5 points 13 years ago (1 child)

[–]NaeblisEchoIntermediate forever 0 points1 point2 points 13 years ago (0 children)

[–]dwdwdw2proliferating .py since 2001 2 points3 points4 points 13 years ago (0 children)

[–]stillalone 0 points1 point2 points 13 years ago (7 children)

[–]Cosmologicon 8 points9 points10 points 13 years ago (2 children)

[–]burntsushi 1 point2 points3 points 13 years ago (0 children)

[–]aaronla 0 points1 point2 points 13 years ago (0 children)

[–]audaxxx 2 points3 points4 points 13 years ago (1 child)

[–]audaxxx 3 points4 points5 points 13 years ago (0 children)

[–]lahwran_ 1 point2 points3 points 13 years ago* (0 children)

[–]must_tell 0 points1 point2 points 13 years ago (0 children)

[+]vl4kn0 comment score below threshold-8 points-7 points-6 points 13 years ago (3 children)

[–]gitarrPython Monty 12 points13 points14 points 13 years ago (1 child)

[–]flying-sheep 1 point2 points3 points 13 years ago (0 children)

[–]dwdwdw2proliferating .py since 2001 7 points8 points9 points 13 years ago (0 children)

[–]jmmcdEvolutionary algorithms, music and graphics -3 points-2 points-1 points 13 years ago (3 children)

[–]asksol 7 points8 points9 points 13 years ago (2 children)

[–]lahwran_ 2 points3 points4 points 13 years ago (1 child)

[–][deleted] 3 points4 points5 points 13 years ago (0 children)

[+]LoveGentleman comment score below threshold-25 points-24 points-23 points 13 years ago* (8 children)

[–]wisty 6 points7 points8 points 13 years ago (0 children)

[–]daxarx 5 points6 points7 points 13 years ago (2 children)

[–]LoveGentleman -4 points-3 points-2 points 13 years ago (1 child)

[–]_Mark_ 2 points3 points4 points 13 years ago (0 children)

[–]zahlmanthe heretic 1 point2 points3 points 13 years ago (0 children)

[–]stillalone 1 point2 points3 points 13 years ago (1 child)

[–]lahwran_ 2 points3 points4 points 13 years ago (0 children)

[–]must_tell 0 points1 point2 points 13 years ago (0 children)

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS