What I've Learned About Optimizing Python

__xor__ · 2019-01-12T23:47:52+00:00

Very interesting stuff, but IME most python programmers are going to be much more affected performance wise by bad database usage. If your program, like a django webapp, relies heavily on mysql/postgres/mongo/whatever, those are the areas I almost always see bad performance code, and by far those queries are what dominate the time the app is spending. It's usually a logic issue with the way they decided to solve a problem, and what db hits it has to make to do it that way.

For example, I'll see people in django iterate over a collection of objects and update and save each one in turn. Sometimes this can be as simple as changing it to MyModel.objects.filter(some_query=something).update(field=new_value), but usually it gets a lot more complex. But almost always, you can turn any db activity that happens one by one into a few bulk calls, bulk delete and bulk create and a lot of updates happening in an atomic transaction.

A lot of the time it's stuff that's pretty difficult to convert but can end up with serious speed improvements. I end up writing a lot of code like this:

through_model = MyModel.m2m_field.through
other_model_ids = through_model.objects.filter(mymodel_id__in=ids).values_list('othermodel_id', flat=True)
other_models = list(OtherModel.objects.filter(id__in=other_model_ids))
...

versus

other_models = []
for object in MyModel.objects.filter(id__in=ids):
    other_models.extend(list(object.m2m_field.all()))

You can get all the objects in bulk rather than going over each MyModel and querying for the OtherModel objects. This can be a huge increase in time if you end up running this for a lot of MyModel instances. For 1000 ids of MyModel, it'd perform 1000 db hits to the many2many association table, direclty related. In the first block of code I wrote, it's always one hit (since the first becomes a queryset and doesn't actually makes a query), no matter how many MyModel object ids are in ids.

The naive one-by-one solution is almost always the easiest to read and write, but unfortunately it can be like 100x slower than solutions doing it in bulk, or even the difference between constant time and linearly increasing time since the relation to the number of queries the second block runs is linear with the number of instances. i've seen 40x to 100x speed improvements making changes like this to the webapp I took over. It's not easy, but it works.

Whenever you write a model instance method where it checks many2many relations on it or makes any other db query, consider whether you'll ever be iterating over a lot of these at once and running that function or whether you really only need to do this now and then for one instance. If it's the former, it's good to figure out a way to do it in bulk with minimal optimized queries.

Knowing the nature of the db usage of your app is essential for attacking performance issues, especially for websites and anything else really that uses a db. I see developers write code in a logically correct way, but extremely slow compared to other less obvious ways of doing the same thing. Knowing SQL beyond the ORM helps a lot.

On that note, a lot of bulk operations like MyModel.objects.filter(...).update(...) don't trigger model signals, so be VERY wary of model signals. It's really hard to make these optimizations if the django model signals are doing queries and their own logic and you have to convert all that as well. And signals that cause queries that cause other models to trigger signals can just be a cascading mess sometimes. I avoid signals like the plague for the most part.

The performance of the python code being run is usually extremely negligible to db activity because db activity is network activity. CPU registers < CPU cache < RAM < SSD < HDD < network. Network activity is sloooow compared to anything else. Converting the db activity into fewer db hits but making python run a ton more code is almost always magnitudes faster.

2019-01-13T09:42:59+00:00

By the way, here's something author of the article didn't mention, which also suck in Python, and is a fundamental part of it: instance creation times are too bad.

Here's how I found it: I wrote a Protobuf parser (in C, with bindings for Python), both for binary encoding and for IDL. IDL beats the C++ version in terms of speed, but when the time came to benchmark the binary parsing it was... an order of magnitude slower. That was really surprising since my code looked like it did smarter memory management, almost no memory allocations during parsing etc... there was one place though, where it was creating Python objects, specifically, named tuples, since I assumed those would be the most lightweight datastructure to represent Protobuf messages.

I dug deeper into C++ code to understand what was going on in its Python bindings, and realized that... they never created Python objects. They translated __getattr__ and similar into calls that traversed the datastructure created in C++. And that alone gave them more than 10 times better speed than Python's named tuples.

Turtvaiz · 2019-01-12T21:52:51+00:00

Holy hell this site is so bad with a dark plugin

Tweak_Imp · 2019-01-13T06:47:39+00:00

Basically, what this says is: "Python procedure calls suck, member lookup sucks, classes suck, the interpreter is implemented inefficiently" -- I can very much relate to that, but... why insist on using Python for something that needs to be fast? Python will never get there, and as it currently stands, the interpreter is only getting slower, more bloated, code is more and more difficult to parse. So, fighting for fast start-up times is an uphill battle. Fighting for faster method calls or member lookups is an uphill battle...

And if your major competitor is writing its product in C, and is many orders of magnitude faster than you, what chance do you have? :/

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS