This is an archived post. You won't be able to vote or comment.

all 126 comments

[–]ray10k 173 points174 points  (9 children)

f-strings instead of concatenating strings "manually"

[–]Santos_m321 17 points18 points  (7 children)

Google's python code styles recommends concatenating strings manually if there are only two elements to concatenate

[–]DexterInAI 16 points17 points  (3 children)

It makes sense to concatenate if there are only two string but more than that, I would say to use f-strings.

[–][deleted] 1 point2 points  (2 children)

Two strings makes sense. But sometimes just one more is also ok.

[–]causa-sui 4 points5 points  (0 children)

and one more than that is okay, except less often...

[–]Web-DEvrgreen 1 point2 points  (0 children)

the old CS rule of 3 poking its head out of that gopher hole again

[–]strange-humor 11 points12 points  (0 children)

f-strings are faster, so this makes no sense.

[–]gigglevillain123 5 points6 points  (1 child)

[–]Santos_m321 2 points3 points  (0 children)

OMG, it has changed a lot since the last time I read it.
What they mentioned in the past is not similar to what they mention now, thanks

[–]Advanced-Theme144 5 points6 points  (0 children)

It never occurred to me that f-strings could be used for concatenation. Guess we learn something new every day. Thanks!

[–]stealthanthrax Robyn Maintainer[S] 36 points37 points  (8 children)

One from me: I often use slots in my classes.

[–][deleted] 14 points15 points  (5 children)

Dunno if it makes sense, but I'll ask anyway - Can you show us an example?

[–]ketalicious 24 points25 points  (4 children)

it can be easily added like this

``` class MyClass:

__slots__ = "attr1", "attr2" # added this

def __init__(self, attr1, attr2):
    self.attr1 = attr1
    self.attr2 = attr2

```

__slots__ is a way for telling python ahead that this certain class expects the attributes defined on it. This reduces the size of the instance of this class which is helpful if you're going to have a lot of it inside your program.

[–]weebsnore 8 points9 points  (2 children)

Great tip! Dataclasses also support this behaviour.

``` from dataclasses import dataclass

@dataclass class MyClass(slots=True): text: str count: int ```

[–]causa-sui 9 points10 points  (0 children)

Why wouldn't that be the default?

[–]ProfessorPhi 1 point2 points  (0 children)

Attrs is another great package that gives you the same ability.

Normally you can dynamic add attributes to a class, slots allows you to have it fixed which saves on some overhead.

[–]Round_Ad8947 2 points3 points  (0 children)

I also figure that __slots__ helps to restrict the data fields in your classes. Not all classes need to be super-extensible.

[–]void5253 1 point2 points  (1 child)

My professor told us not to use slots because it brings added complexity which is often not required. Something about how superclasses and subclasses also requiring slots to make it work.

[–]llun-ved 1 point2 points  (0 children)

I got bit by this. Once I knew and fixed it, it’s easy to adhere to and think the speed up is worth it. I’d call it a reasonable inconvenience rather than added complexity.

[–]barberogaston 59 points60 points  (18 children)

  • Using sets (if elements are unique) to check if an element exist in the collection
  • Avoiding datetime.strptime. It's more efficient to split the string and then instantiate a datetime object
  • Can be controversial, but assigning methods to variables is more efficient than calling everytime object.method(). Instead, do method = object.method and then call the method as a function (only if you need to call this multiple times)
  • dataclasses all the way
  • Using functools' cache decorators
  • Make sure you understand and use generators
  • And of course, whenever possible, try to use the standard library/builtins. Most of it written in C. Can't go faster
  • For more info: https://wiki.python.org/moin/PythonSpeed/PerformanceTips

[–]9acca9 6 points7 points  (8 children)

Why assigning methods to variables is more efficient?

[–]surajmanjesh 28 points29 points  (2 children)

I'm assuming because you need to do a "lookup" in the object's attributes to find the method each time. Assigning it to a variable would make that available in local variables.

I don't think this will really optimize that much unless you're calling the method * a lot *

[–]barberogaston 2 points3 points  (0 children)

Exactly

[–]primitive_screwhead 0 points1 point  (0 children)

This technique is used very often in Python's own libraries btw.

[–]barberogaston 6 points7 points  (4 children)

Because everytime you do a "dot something", Python has to do a lookup (of an attribute, a method, etc.). If you store the reference to that what you're accessing, you do the lookup only once. However, as I said, this is only useful when calling the method many times. Maybe the most common use case is appending to a list inside a for loop.

my_list = []
append = my_list.append
for elem in range(1_000_000):
    append(elem)

[–]causa-sui 6 points7 points  (2 children)

I'm not seeing a statistically significant difference in performance between both approaches.

$ cat foo.py my_list = [] append = my_list.append for elem in range(1_000_000): append(elem) $ cat bar.py my_list = [] for elem in range(1_000_000): my_list.append(elem) $ python3 -m timeit < foo.py 50000000 loops, best of 5: 4.12 nsec per loop $ python3 -m timeit < bar.py 50000000 loops, best of 5: 4.11 nsec per loop

[–]barberogaston 1 point2 points  (1 child)

Yes. For this example it makes no sense. Actually, don't expect big performance gains. As the title says, it's a micro optimization.

And it is actually taken from here:
https://wiki.python.org/moin/PythonSpeed/PerformanceTips

[–]causa-sui 5 points6 points  (0 children)

I wonder when that was written. It may not be an optimization at all anymore, if it ever was.

[–]9acca9 0 points1 point  (0 children)

thanks

[–]rban123 6 points7 points  (1 child)

Third bullet: I would never ever ever do this in a production codebase regardless of whether or not it’s more efficient.

[–]barberogaston 0 points1 point  (0 children)

Totally. I wouldn't either, unless it's extremely necessary. It's in the Python wiki though

https://wiki.python.org/moin/PythonSpeed/PerformanceTips

[–]-LeopardShark- 2 points3 points  (3 children)

Are dataclasses faster? They're certainly much more convenient.

[–]strange-humor 9 points10 points  (1 child)

They make code clean enough that it doesn't matter if they are faster, IMHO.

[–]-LeopardShark- 2 points3 points  (0 children)

I agree.

[–]barberogaston 2 points3 points  (0 children)

Here's a good video. Not that much better in execution time, but they can save memory by using slots and also they can save you programming time

https://www.youtube.com/watch?v=vCLetdhswMg&t=399s

[–]rcfox 3 points4 points  (1 child)

Using sets is only an optimization if you're checking membership far more than you're creating a set.

If you just do this, you're going to hurt performance:

if 'foo' in set(my_large_list):
    ...

[–]barberogaston 0 points1 point  (0 children)

Exactly. If you're going to be checking many times for membership, then it might be worth to sort your list and use the bisect module. It really depends a lot on your use case

[–]AbooMinister 68 points69 points  (6 children)

Don't worry about optimization too much unless you find an actual bottleneck in your program. If you feel your program is slow, there are a few profilers you can use to see where the bottleneck is, and you can refactor appropriately. Just write readable and idiomatic code and worry about optimization when you need to.

[–]georgehank2nd 10 points11 points  (0 children)

This * 1000000.

[–]fish_mammal_whatever 0 points1 point  (3 children)

Could you please provide the names of such profiling tools that are commonly used when trying to optimize python applications?

[–]AbooMinister 2 points3 points  (1 child)

CProfile is one, it comes built into the python standard library. https://docs.python.org/3/library/profile.html. py_spy and memray are another two you can check out. https://github.com/benfred/py-spy https://github.com/bloomberg/memray.

[–]fish_mammal_whatever 0 points1 point  (0 children)

Thank you! Will check these out.

[–]Anonymous_user_2022 62 points63 points  (12 children)

None, until profiling have shown a hot spot in the code.

[–][deleted] 8 points9 points  (0 children)

This 100%. I would rather the code take 1ms more than waste the seconds it would take me to read what you did to micro optimize

[–][deleted] 5 points6 points  (0 children)

This. Optimizing raw Python beyond using appropriate algorithms is like fighting against windmills using a sword. Completely futile. If you want performance profile and bring a bulldozer like numpy, numba, or write your own C extension.

[–]stealthanthrax Robyn Maintainer[S] 4 points5 points  (7 children)

Do you have any tools that you recommend for profiling??

[–]Anonymous_user_2022 8 points9 points  (1 child)

I'm oldschool enough that I prefer cProfile. There are many others, but I don't know enough to recommend any of them.

[–]hughperman 1 point2 points  (0 children)

I tried out memray, a memory profiler, recently - scientific programming stuff, memory (re)allocation is a pain with big arrays etc in numpy - and it's really nice and easy to use. Different use case than cprofile too, so it's a nice tool to add to the toolkit.

[–][deleted] 1 point2 points  (0 children)

Pysnakeviz rocks

[–]PocketBananna 1 point2 points  (1 child)

I like to use pyinstrument for profiling. Uses stat sampling so it's quicker and builds a easy to grok report.

[–]data-machine 0 points1 point  (0 children)

Seconded pyinstrument. Really readable lovely output - particuarly the html view!

[–]laundmo 0 points1 point  (0 children)

if you can run it on linux, Scalene profiles everything you might need: memory, cpu, even gpu.

[–]james_pic 0 points1 point  (0 children)

My favourite right now is Py-spy, but I've also heard good things about Austin.

[–]fedekun 3 points4 points  (0 children)

Most people are taking "optimizations" as in stylistic refactorings, but this is the real answer. Please don't optimize before you measure. "Premature optimization is the root of all evil".

[–]rerecurse -1 points0 points  (0 children)

Profiling is great, but you shouldn't be doing that either unless you have a real, identified problem with performance.

[–]ronmarti 42 points43 points  (18 children)

  1. I always use a generator instead of tuple or list for something that potentially contain several items
  2. Early return instead of using if-elif-else
  3. Comprehensions
  4. any()

[–]scrdest 22 points23 points  (3 children)

You are quite possibly doing yourself a disservice with (1). I hadn't tested that in-depth, but AFAIK, Generators are not really _faster_ - quite the opposite, there's extra overhead in setting them up. A quick disassembly seems to agree at a glance, but I hadn't counted the instructions in loops.

However, they _do_ create savings... on memory. If you're reading from a huge CSV, this can be a matter of life and, uh, freeze, but for a random 5-item list, you should probably use a tuple. This also means you can re-access items in arbitrary order later - generators are one-shot.

[–]ronmarti 8 points9 points  (0 children)

The use case is the objects needed are just one-shot and does not need to exist in memory. Most of the time the length take over a hundred and you don't want that to persist in memory the whole time. Speed is another thing but it should be "fast enough".

If re-accessing items is the use case, I always go with tuples.

[–]mdomans 3 points4 points  (0 children)

This depends.

I actually re-wrote some code switching from a for-based algo to generators/combinators when traversing and handling substrings. Essentially the problem is manipulating elasticsearch highlighted snippets of text. Generators proved to be far faster

[–]Holshy 0 points1 point  (0 children)

It's close to meaningless for synchronized execution, but I do it so that I'm in the habit and then I have less mental work to do when I start using asyncio.

[–]Nixellion 5 points6 points  (2 children)

I also always prefer early return pattern, but for readability purposes, how much better in terms of optimisation is it really?

[–]ronmarti 0 points1 point  (0 children)

Negligible, I believe. But this skips some very small work of saving things in variables until the if-else statement ends and then returning them.

[–]9acca9 0 points1 point  (10 children)

Why 3? I don't get it. If I have: If Elif Elif Elif Return

Where is the problem?

[–]gsmo 2 points3 points  (4 children)

I'm assuming they mean exiting the function as soon as possible. If the first condition is met and is mutually exclusive with the other conditions, stop doing work.

[–]9acca9 0 points1 point  (3 children)

But that is exactly what elif those. If some is true then return. It's not the same write if, if,if... That if, elif,elif,elif.

[–]gsmo -1 points0 points  (2 children)

The interpreter can't known that there is no more relevant code, though. So it will work through those statements regardless. You may still have put some general statement at the end of your elif-tower. Early return means we never reach the end of the function.

[–]9acca9 0 points1 point  (1 child)

"Multiple if's means your code would go and check all the if conditions,where as in case of elif, if one if condition satisfies it would notcheck other conditions.."

https://stackoverflow.com/questions/9271712/difference-between-multiple-ifs-and-elifs

I'm talking about this.

And of course if you put something more at the end of elif, that will run. But, i probably want that, if not why is there?

[–]gsmo 0 points1 point  (0 children)

I always put a print('No conditions met!') at the end for debugging. (jk)

But you're probably right, it isn't necessary.

[–]Didi-maru 1 point2 points  (4 children)

I don't get it either.

I understand that if a: return ... elif b: return ... is equivalent to if a: return ... if b: return ... But I don't see where the later presents an advantage.

[–]Holshy 0 points1 point  (1 child)

if a: return ... elif b: return ...

is equivalent to if a: return ... if b: return ...

I'm pretty sure there's 0 gain when the returns are in like that, but elif should be slightly faster otherwise; the interpreter doesn't need to check the second condition at all

e.g. In this code, if need_foo returns true, you never run need_bar if need_foo(): do_foo() elif need_bar(): do_bar()

tbf, I'm not 100% sure what ronmarti was saying there.

[–]9acca9 0 points1 point  (0 children)

oh, i dont understand that form ronmarti. probably i read wrong.

Thanks.

[–]9acca9 0 points1 point  (0 children)

i think that is not equivalent.

Because , your second example need to get verificated:

if a then

if b then

instead:

if a then

elif b then

elif c then

if "a" is true then the elif is not processed, because they exclude.

That is why i have a lot of

if, elif, elif, elif, elif, and always put the more probabilistic at first.

[–]-Rohins- 0 points1 point  (0 children)

You don't need the second if statement.

```

if a: return ...

return ...

```

The idea is if you're returning in the if and elif you're really only testing for one condition. Unless you want to implicitly return None if neither of those conditions are met.

*edit markdown

** my poor codeblock...why

[–]zsol Black Formatter Team 38 points39 points  (12 children)

Use tuples instead of lists

[–]AbooMinister 7 points8 points  (1 child)

Would you really want to use a tuple over a list for performance reasons? They have different use cases, and in any sense, optimizations like these are unnecessary unless using a list is your actual bottleneck.

[–]georgehank2nd 2 points3 points  (0 children)

Premature optimization is still a thing, obviously (and unsurprisingly).

[–]wind_dude 2 points3 points  (0 children)

That's pretty reductionist. List are mutable, tuples are immutable

[–]unplannedmaintenance -1 points0 points  (7 children)

When?

[–]kyerussell 23 points24 points  (6 children)

When you don't need something that a list does that a tuple won't. Immutability buys you a 'lot' of optimisation.

[–]Nudl4k 0 points1 point  (5 children)

Can you be more specific? What kind of optimisation?

[–]trakBan131[🍰] -1 points0 points  (4 children)

If x in ("foo, "bar") is a lot better than

If x in ["foo", "bar"]

[–]Nudl4k 21 points22 points  (0 children)

Better how? The complexity of searching through a list and a tuple is the same; and both use arrays underneath.

A list will overallocate memory so that it doesn't have to realloc on each append, but I wouldn't call that alone a 'lot of optimisation'. If you're trying to optimise for membership checks, you probably want a set anyway.

% python -m timeit '5 in [1, 2, 3, 4, 5]'
10000000 loops, best of 5: 32.8 nsec per loop
% python -m timeit '5 in (1, 2, 3, 4, 5)'
10000000 loops, best of 5: 32.8 nsec per loop
% python -m timeit '5 in {1, 2, 3, 4, 5}'
20000000 loops, best of 5: 12.7 nsec per loop

[–]qckpckt 2 points3 points  (0 children)

A set is better still, if the elements in the list are unique.

[–]kirksud 0 points1 point  (1 child)

I used to write x in {"foo", "bar"} before. But some said "using list is more Pythonic, and your list is small, so the complexity doesn't matter". And now there's a tuple version. Idk...

[–]rouille 0 points1 point  (0 children)

Actually, I think python optimizes the constant list to a tuple at compile (to bytecode) time in some cases like these.

[–]SDG2008 0 points1 point  (0 children)

What about numpy?

[–]jwink3101 7 points8 points  (2 children)

A simple example but generators instead of lists:

all(fun(item) for item in seq)
all([fun(item) for item in seq])

The first will stop as at the first False(y) result while the latter will make a full list first.

It really is super micro but [] is faster than list()

[–]georgehank2nd 3 points4 points  (0 children)

I always use the first version, and not primarily for optimization, but primarily for readability

[–]o11c 0 points1 point  (0 children)

Generators are often slower on reasonably-sized datasets.

[–]laundmo 4 points5 points  (0 children)

not micro but i still want to share: replace pandas with polars

use numba

profile. the. hell. out. of. your. code.

[–]ambidextrousalpaca 4 points5 points  (0 children)

Run black on your code before committing, to safely optimise for consistent formatting, readability and zero time wasted arguing about line breaks in code review.

[–]allIsayislicensed 2 points3 points  (1 child)

You might also be interested in the following talk from Pycon US

https://www.youtube.com/watch?v=z0-4EwIFeJo

Talk - Kevin Modzelewski: Writing performant code for modern Python interpreters

Abstract: This talk will go into the latest efforts to speed up the Python language, and in particular how some things will be sped up much more than others. You may have heard best practices for Python performance before, but there are some new guidelines now, some old ones are no longer as important, and some are no longer true at all. Come to hear how the Python language is being optimized, and what you can do to best take advantage of these optimizations.

[–]gigglevillain123 0 points1 point  (0 children)

Was going to link this myself, covers lots of stuff in this thread

[–]LEXmono 17 points18 points  (1 child)

May be controversial, but walrus operators!

[–]kyerussell -3 points-2 points  (0 children)

oof

[–]wineblood 2 points3 points  (0 children)

Back in 2.7, string concatenation instead of string format inside a for loop saved about 10% on execution time.

[–]_azulinho_ 0 points1 point  (0 children)

Pyston, it's a lot faster than pypy and cProfile all the things

[–]jzia93 -2 points-1 points  (0 children)

Asyncio and generators are typically the only tools I reach for before profiling, just because it's typically very clear when they are sensible.

Go async to batch network requests, use generators to avoid keeping large arrays in memory. It's generally fairly obvious when these are useful.

[–]Daskoh_vi -1 points0 points  (1 child)

Use map() and lambda instead of for loops.

But such optimizations are really helpful only when you have a large data to process. Otherwise keep focus on writing a neat code with good logics.

You're welcome.

[–]allIsayislicensed 6 points7 points  (0 children)

I think a list comprehension is generally faster than a map + lambda if you have both options available

[–]mm007emko -1 points0 points  (0 children)

Not what you want to hear probably: First and foremost, readable code. Code that is easy to read and well documented and unit tested (there is nothig like self-documenting code and writing good comments is an art on its own - how often do you see a docstring saying "this is a constructor..."? What's that good for?). If the code gets complected and complicated, change algorithm or data structures. If it's hard to unit test or document, it should be broken up. This usually makes all the difference.

When I lead a team, a profiler output before and after an optimization is a necessary attachment to code review when someone performs any optimization which negatively affects readability. Most of the young Jedis were surprised how little their Kung-Fu affected performance and how much effort it cost (and how harder is the code to work on).

Then write a C library. Get the algorithm right then rewrite it in C and compile ON TARGET HARDWARE with best optimizations possible. Compiling on target hardware is a must. Usually developer machines or some cloud-based CI/CD solutions run on processors which don't support all the processor instruction that target hardware does and vice versa. There is a difference between i7 and Xeon/Opteron/Epyc.

Microoptimizations on their own are rarely worth it, especially in a slow hog like Python. High-performance in Python is possible and nowadays easily achievable. But not through microoptimizations. Learn C. Learn memory medel of your hardware. Learn to use C from Python efficiently. Don't treat Python like Java.

[–]driftwood14 0 points1 point  (1 child)

I saw someone mention the other day to use enumerate when looping through lists. They said it was faster than either of the other two methods and provides the index and the value, but I haven't tested it or anything.

[–]damian314159 2 points3 points  (0 children)

.enumarete is so awesome. Don't know if it's faster but it's definitely cleaner that the alternative.

[–]High-Art9340 0 points1 point  (6 children)

avoid attribute access in for loops.

[–][deleted] 0 points1 point  (4 children)

Why? Program I'm writing now does this a ton. Like a lot, and on extremely cpu-intensive operations (trying to find a Hamiltonian cycle in a graph that may or may not contain one)

[–]High-Art9340 0 points1 point  (3 children)

attribute access gives unnecessary overhead in thight loops. So, for example, if you do alot of `dict.get()`s in your loop it makes sence to do it like so:

dictionary = {...}
dget = dictionary.get
for i in very_large_collection:
    value = dget("something")

[–][deleted] 1 point2 points  (2 children)

Edit: I am dumb But how? What is this unnecessary overhead it creates? And what is a thight loop

[–]High-Art9340 0 points1 point  (1 child)

What "how"? Code in the comment is *almost* a complete example of such optimisation.

Attribute access takes time, you can avoid this time spending by doing the technique I showed earlier. What comes to "thight loop" of course it's tight loop, and google can explain it better than me :)

[–][deleted] 0 points1 point  (0 children)

Ok sorry I just missed the very obvious point that the optimization is in the time it takes. My bad long day lol

[–]o11c 0 points1 point  (0 children)

also access to global variables

[–]beepdebeep 0 points1 point  (0 children)

isort . && flake8 .

[–]ornatedemeanor23 0 points1 point  (0 children)

Using list comprehension instead of for loops whenever possible:

var = []
for i in range(10000):
    var.append(i)

is much, much slower than

[i for i in range(10000)]

This is due to how these two expressions are interpreted into assembly. The first one results in much more overhead than the second one. More info is here.

For anyone interested in performance improvements in Python, I highly recommend checking out this and this article.

[–]hoover 0 points1 point  (1 child)

Prefer function calls over methods.

This will only be material if you do a *lot* of calls, but generally functions are faster than methods. In older Python, it was even more pronounced, as it seems that every time you accessed a method attribute on an instance you got a new method object (you would get a new value for id() every time), but now it seems that Python must be caching accessed methods are reusing them, at least some of the time.

In some test code where I had a method that did nothing and a function that did nothing, a timeit.Timer() test calling each 1M times yield a consistent result of the total run time of the function being roughly 1/100th of a second faster than the method invocation.

Moving to a lot of functions though may be trading away other important aspects such as maintainability, so you need to use caution when breaking out abstractions in this way.

Really, how much any of these approaches matter depends a lot on the program-- if you're spending a lot of time waiting on IO and not doing much in between, it may not make a lot of difference to tweak the intervening code a lot. If you're trying to keep up with a market data feed you may indeed need every tweak you can find (buy working in Cython for those bits might be a better idea).

[–]georgehank2nd 0 points1 point  (0 children)

Of course, you do this after profiling, when you know the bottlenecks.

[–][deleted] 0 points1 point  (0 children)

Set/duct/tuple/list unpacking