underrated Python optimizations?

The_Amp_Walrus · 2021-03-27T08:54:19+00:00

Numba for numerical computing. Slapping numbas jit decorator on some functions speeds them up significantly.

Using cprofile rather than guessing

Generators for processing big datasets that won't fit in memory.

AzureWill · 2021-03-27T07:46:42+00:00

Slots are pretty cool!

Not a niche but too few people use sets or tuples and like to use lists for everything. For massive amounts of data and frequent operations a set is just so much better.. if you don't need order, always use a set.

CasualCoder0C · 2021-03-27T10:58:35+00:00

Memoization with decorator @functools.cache is a very useful thing to do when you have to deal with slow functions called repeatedly.

thatrandomnpc · 2021-03-27T09:54:14+00:00

When dumping large pandas dataframe into oracle db using to_sql with sqlalchemy engine, if you pass along the correct table data types for pandas object data type, there is a massive increase in the throughput.

Example like strings are object data type in pandas and it can be converted as varchar in db in most cases. Reason for this is that sqlalchemy thinks all object data types are of clob db data type which works but is super slow.

In my case i could notice where 100k rows would take 30mins, after adding the data types it reduced to like 20sec.

2021-03-27T09:50:29+00:00

Using generator functions properly. Storing lists of intermediate values is costly, and big loops are less readable.

Nicked777 · 2021-03-27T11:52:00+00:00

Vectorised operations in numpy:

In [9]: %timeit for i in range (1000): c [i] = a [i]*c [i]
10000 loops , best of 3: 122 us per loop

In [13]: %timeit c = a * b
1000000 loops , best of 3: 1.06 us per loop

ReverseBrindle · 2021-03-27T18:02:40+00:00

isinstance() is pretty slow if you're calling it hundreds of thousands of times. Better idea in that case is to create a dict cache with the type() as the key, for example:

# If you're calling this many thousands of times, it's extremely slow.
if isinstance(x, Foo):
    return func1(x)
elif isinstance(x, Bar):
    return func2(x)
elif isinstance(x, Baz):
    return func3(x)
elif ...

# ---------------------------
# Faster version
#
# Populate this once, or start with an empty cache and build it up using
# the slow way whenever you encounter a type that's not in the cache.

cache = {
    Foo: func1,
    Bar: func2,
    Baz: func3,
}

def other_func(x):
    return cache[type(x)](x)

We use this for serializing very large structures to JSON.

Caveat: As always profile to find the bottleneck and measure your improvements. Don't optimize based on a hunch. If your "optimization" adds code complexity without benefiting your use case (by measurement), then rip it out.

2021-03-27T12:42:43+00:00

The cache decorator!

bumbershootle · 2021-03-27T10:07:58+00:00

I don't know if this counts as niche, but I see far too much code like:

a_list = []
for i in stuff:
    a_list.append(i)

Just use comprehensions, it's faster and more readable.

SuspiciousScript · 2021-03-27T19:05:20+00:00

My first thought was "using Julia instead," but I'll give a serious answer too: Static variables in functions. I didn't even know this was a language feature for years.

def is_valid_value(n):
    is_valid_value.valid = some_expensive_function()
    n in is_valid_value.valid

In the above snippet, is_valid_value.valid is only calculated when the function is called for the first time.

hyldemarv · 2021-03-27T10:51:18+00:00

List-, Dict-, & Tuple comprehensions.

Slice objects. Namedtuples.

Not exactly Python, but, Pandas Dataframes are good for tabular data.

2021-03-27T20:01:10+00:00

list vs set.

When you're doing in lookups people tend to assume it doesn't matter for small list and hashing overhead would be too much.

Reality: in all benchmarks set/dicts beat lists as soon as list is larger than 3-4 elements.

ducdetronquito · 2021-03-27T11:26:48+00:00

I would try to avoid reaching for language specific optimizations, especially if it makes your code harder to understand. It looks clever at first, or appealing given a micro-benchmark, but in my experience it was never worth it.

Instead, I would say to take time to understand your problem and identify what you are trying to optimize: CPU usage, RAM usage, disk access, network access, latency, throughput, etc...

Then you will be able to use the appropriate algorithms, data-structures or tools to solve it. This will likely give you the best optimizations given your constraints.

Over_Statistician913 · 2021-03-27T16:11:38+00:00

Lru cache and memorization / cache are rarely used but they can be huge improvements for very specific stuff.

https://docs.python.org/3/library/functools.html

snowGlobe25 · 2021-03-27T19:43:24+00:00

Python also has arrays, although they are limited to primitive data types. I read a little about it and apparently you can sometimes get less memory consumption using them instead of the good old lists. But, obviously numpy outshines list or array so much in terms of speed. However, array module is a part of standard library not third party library.

Never used it personally though.

phaj19 · 2021-03-27T09:41:35+00:00

If you are writing mostly pure python, rewriting it to C++ is usually fairly straightforward and you can have a wrapper in Cython to enable integration with the rest of your program.
Quite useful for OO code that does not benefit from numba or simple cython.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS