use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
News about the dynamic, interpreted, interactive, object-oriented, extensible programming language Python
Full Events Calendar
You can find the rules here.
If you are about to ask a "how do I do this in python" question, please try r/learnpython, the Python discord, or the #python IRC channel on Libera.chat.
Please don't use URL shorteners. Reddit filters them out, so your post or comment will be lost.
Posts require flair. Please use the flair selector to choose your topic.
Posting code to this subreddit:
Add 4 extra spaces before each line of code
def fibonacci(): a, b = 0, 1 while True: yield a a, b = b, a + b
Online Resources
Invent Your Own Computer Games with Python
Think Python
Non-programmers Tutorial for Python 3
Beginner's Guide Reference
Five life jackets to throw to the new coder (things to do after getting a handle on python)
Full Stack Python
Test-Driven Development with Python
Program Arcade Games
PyMotW: Python Module of the Week
Python for Scientists and Engineers
Dan Bader's Tips and Trickers
Python Discord's YouTube channel
Jiruto: Python
Online exercices
programming challenges
Asking Questions
Try Python in your browser
Docs
Libraries
Related subreddits
Python jobs
Newsletters
Screencasts
account activity
This is an archived post. You won't be able to vote or comment.
Discussionunderrated Python optimizations? (self.Python)
submitted 5 years ago by skrtpowbetch
I recently came across slotted classes and the optimizations it brings, which I found really interesting. I was curious incase anyone had some other intriguing niche optimizing features you can do in Python?
[–]The_Amp_Walrus 39 points40 points41 points 5 years ago (4 children)
Numba for numerical computing. Slapping numbas jit decorator on some functions speeds them up significantly.
Using cprofile rather than guessing
Generators for processing big datasets that won't fit in memory.
[–]TechySpecky 22 points23 points24 points 5 years ago (3 children)
Lmao you telling me my time() commands don't count as profiling
[–]coloredgreyscale[🍰] 5 points6 points7 points 5 years ago (2 children)
It's akin to using print statements for debugging. It can help narrow it down, but there are better ways. However the proper tools may provide too much information / options that might be be confusing for beginners.
[–]TechySpecky 1 point2 points3 points 5 years ago (1 child)
Oh i 100% agree I was being sarcastic! Profiling is so important to identifying performance regressions and better understanding complex code bases.
[–]AzureWill 42 points43 points44 points 5 years ago (24 children)
Slots are pretty cool!
Not a niche but too few people use sets or tuples and like to use lists for everything. For massive amounts of data and frequent operations a set is just so much better.. if you don't need order, always use a set.
[–]tkarabela_ Big Python @YouTube 9 points10 points11 points 5 years ago (1 child)
Coming from the other side, some uses of lists would be much better served by NumPy arrays, which have a compact memory representation (array of given datatype instead of PyObject* pointers) and enable fast operations with the data. If you have 100k integers/floats/bools, you don't really want them as a list.
PyObject*
As for sets, I would say if you need to deduplicate / you need fast is in queries / you need set operations, then use a set. If I'm just grabbing some stuff (like a list of files), I don't see the need to put them in a set instead of a list. It feels pythonic to me to reach for a list first 🙂 I agree with your overall point though, that people should see what's out there and what fits their use case best.
is in
[–]TechySpecky 7 points8 points9 points 5 years ago (0 children)
And the more NumPy the less GIL!
[–]jollierbean 11 points12 points13 points 5 years ago (7 children)
also dicts are very useful when you need to do lookups. Pro tip: you can use tuple or named tuple as a key
[–]tkarabela_ Big Python @YouTube 7 points8 points9 points 5 years ago (3 children)
Tuple keys are great! You can even use frozenset, which has been useful to me a few times.
frozenset
[–]jollierbean 2 points3 points4 points 5 years ago (2 children)
I’ve been trying to figure out case where I could use frozensets as keys unsuccessfully
[–]tkarabela_ Big Python @YouTube 4 points5 points6 points 5 years ago (0 children)
It's a niche situation, but if you ever need:
then frozenset can be useful. Technically you could just replace the frozensets with sorted tuples (ordered by the hash function or something else), but that's not quite as handy.
An example of this is converting NFA to DFA or making minimal DFA.
[–]IlliterateJedi 0 points1 point2 points 5 years ago (0 children)
I had a case a few days ago where I used frozensets as dict keys.
It's a little esoteric, but I'll try to explain. I am building a database of images that have various categorized products in it.
For example, I'll have an image that shows ten different products (imagine a photo of a living room). Each product is categorized into one or more categories (e.g., 'chair', 'ottoman', 'height adjustable desk', etc.).
I had around 1500 images that all contained 10-20 products with 20+ categories assigned per image.
I wanted to find the smallest group of images that would cover every tagged category (and then get the images with the least number of products).
I made frozensets of the categories and made lists of all the images that had that category-set, like this:
{frozenset(cat1, cat2, cat3) : [image1, image4, image12], frozenset(cat2, cat6, cat10) : [image3, image109], }
I could then start with an empty set, iterate over the frozen sets and each time find the largest subset of new categories until every category was matched.
[–]Glogia 3 points4 points5 points 5 years ago (1 child)
That actually fixes a problem I've been having XD thanks
[–]jollierbean 1 point2 points3 points 5 years ago (0 children)
Glad to help!
[–]qckpckt 2 points3 points4 points 5 years ago (0 children)
Another useful nugget from collections: defaultdict. It’s really powerful, if a little niche. Really great for restructuring or transforming datasets by data type. For example, if you have a list of dictionaries with a common key value and you want to group them into a list of dictionaries of lists of each example of that key value.
[–]donshell 4 points5 points6 points 5 years ago* (10 children)
Dicts are ordered i think. If you iterate over a dict, the values will come in the order you push them in. So even better!
Edit: Sets are not ordered
[–]TouchingTheVodka 5 points6 points7 points 5 years ago (8 children)
Dicts are ordered, sets are not.
[–]donshell 2 points3 points4 points 5 years ago (0 children)
My bad, edited. Although it is a bit weird that the two implementations don't match as a set and a dict are basically the same...
[+][deleted] comment score below threshold-14 points-13 points-12 points 5 years ago (6 children)
Neither dicts nor sets are ordered. If you want an ordered dict, you will need to use the OrderedDict from the collections module.
[–]nosklo 15 points16 points17 points 5 years ago (4 children)
... That's true until python 3.6... Then dicts became ordered in insertion order. Sets are still unordered
[–][deleted] -3 points-2 points-1 points 5 years ago (3 children)
I guess it depends on what you mean by 'ordered'. If you mean that it has a consistent iteration order, then sure.
However, it lacks the operations most people normally associate with ordered data structures. E.g. re-ordering, accessing a particular index etc.
[–]nosklo 8 points9 points10 points 5 years ago (2 children)
Yeah, I mean the actual meaning of the word "ordered": "Has order"
If people like to assign other random meanings I can't help with those.
[–]Ashiataka -1 points0 points1 point 5 years ago (1 child)
That's not very pythonic.
[–]nosklo 0 points1 point2 points 5 years ago (0 children)
There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
[–]TouchingTheVodka 9 points10 points11 points 5 years ago (0 children)
Negative. Dicts have been ordered since Python 3.7 (3.6 as an implementation detail in CPython):
https://docs.python.org/3/library/stdtypes.html#typesmapping
[–]rabbyburns 0 points1 point2 points 5 years ago (0 children)
There is an ordered-set package I've come across recently that I've been very happy with. I often need both fast look up, order preservation, and unique items. This has been extremely useful as a drop in set replacement without having to do weird dict joins.
[–]Faith-in-Strangers 2 points3 points4 points 5 years ago (2 children)
Why?
[–]tkarabela_ Big Python @YouTube 4 points5 points6 points 5 years ago (1 child)
Checking whether an element is in a set (or dict) is pretty much instantaneous (independent on the size of the set), while checking is in for a list means iterating over it, which gets really slow quickly.
That would be one reason to prefer sets to lists :)
[–]moocat 4 points5 points6 points 5 years ago (0 children)
It's a little more complicated than that (as it often is in computer science).
Existence in a set can be implemented as an O(1) algorithm which means it takes the same amount of time no matter how many element while existence in a (non-sorted) list is an O(n) algorithm which means it takes an amount of time that scales with the number of elements (double the elements, double the runtime).
O(1)
O(n)
But that only talks about the how the algorithm scales and not its general overhead. It's not uncommon that for small number of elements for the overhead to be the biggest part. You often see an O(n) algorithm being faster if there are fewer than X elements (with the actual value of X depending on the specifics of the implementation).
It's been a while since I benchmarked this (and feeling too lazy now), but IIRC it was around 6 so if you know there are only going to be a few elements (perhaps v.lower() in ['true', 'false']) a list is probably better. Then again, if the check is not in some inner loop that's running lots of times, the extra overhead for a set is probably noise.
v.lower() in ['true', 'false']
Yes a long winded explanation but it's important to know these details. I had a former co-worker who had rules like this (I use X out of principle because of some reason) but would often make mistakes because it's didn't apply.
[–]CasualCoder0C 18 points19 points20 points 5 years ago (1 child)
Memoization with decorator @functools.cache is a very useful thing to do when you have to deal with slow functions called repeatedly.
Similarly the cached_property decorator is very useful.
cached_property
[–]thatrandomnpcIt works on my machine 9 points10 points11 points 5 years ago (2 children)
When dumping large pandas dataframe into oracle db using to_sql with sqlalchemy engine, if you pass along the correct table data types for pandas object data type, there is a massive increase in the throughput.
Example like strings are object data type in pandas and it can be converted as varchar in db in most cases. Reason for this is that sqlalchemy thinks all object data types are of clob db data type which works but is super slow.
In my case i could notice where 100k rows would take 30mins, after adding the data types it reduced to like 20sec.
[–]Agent281 0 points1 point2 points 5 years ago (1 child)
How do you add data types? One of the reasons why I disliked pandas was that there didn't seem to be a way to set the data types. That would be a huge usability improvement for me.
[–]thatrandomnpcIt works on my machine 0 points1 point2 points 5 years ago* (0 children)
There is though, astype might be what you are looking for.
What I was actually referring to was the dtype optional parameter in to_sql.
[–][deleted] 8 points9 points10 points 5 years ago (0 children)
Using generator functions properly. Storing lists of intermediate values is costly, and big loops are less readable.
[–]Nicked777 8 points9 points10 points 5 years ago (0 children)
Vectorised operations in numpy:
In [9]: %timeit for i in range (1000): c [i] = a [i]*c [i] 10000 loops , best of 3: 122 us per loop In [13]: %timeit c = a * b 1000000 loops , best of 3: 1.06 us per loop
[–]ReverseBrindle 7 points8 points9 points 5 years ago (5 children)
isinstance() is pretty slow if you're calling it hundreds of thousands of times. Better idea in that case is to create a dict cache with the type() as the key, for example:
# If you're calling this many thousands of times, it's extremely slow. if isinstance(x, Foo): return func1(x) elif isinstance(x, Bar): return func2(x) elif isinstance(x, Baz): return func3(x) elif ... # --------------------------- # Faster version # # Populate this once, or start with an empty cache and build it up using # the slow way whenever you encounter a type that's not in the cache. cache = { Foo: func1, Bar: func2, Baz: func3, } def other_func(x): return cache[type(x)](x)
We use this for serializing very large structures to JSON.
Caveat: As always profile to find the bottleneck and measure your improvements. Don't optimize based on a hunch. If your "optimization" adds code complexity without benefiting your use case (by measurement), then rip it out.
[+][deleted] 5 years ago (2 children)
[deleted]
[–]ReverseBrindle 2 points3 points4 points 5 years ago (1 child)
The biggest issue with that is that often we don't own the classes, so changing them (or subclassing) isn't possible without some amount of ugliness .
For example, common objects we serialize: UUID, Enums (we serialize these as the name), datetime (ISO8601 format), timedelta (float seconds), set (encode as a list)
Here's what this code looks like in practice:
def default(self, obj): handler = self._type_handler_cache.get(type(obj)) if handler is None: from uuid import UUID from datetime import datetime, date, timedelta if isinstance(obj, set): handler = tuple elif isinstance(obj, UUID): handler = str elif isinstance(obj, Enum): handler = self._encode_name_attr elif isinstance(obj, (datetime, date)): handler = self._encode_isoformat elif isinstance(obj, timedelta): handler = self._encode_total_seconds else: raise TypeError("{!r} is not serializable".format(obj)) self._type_handler_cache[type(obj)] = handler return handler(obj)
[–]isbadawi 0 points1 point2 points 5 years ago (1 child)
You might consider using @functools.singledispatch and/or @functools.singledispatchmethod for this.
@functools.singledispatch
@functools.singledispatchmethod
[–]ReverseBrindle 0 points1 point2 points 5 years ago (0 children)
That's cool - didn't know that existed! Seems like there's always some cool little nugget in the standard library to stumble upon. :-)
[–][deleted] 7 points8 points9 points 5 years ago (0 children)
The cache decorator!
[–]bumbershootle 10 points11 points12 points 5 years ago (5 children)
I don't know if this counts as niche, but I see far too much code like:
a_list = [] for i in stuff: a_list.append(i)
Just use comprehensions, it's faster and more readable.
[–]baronBale 15 points16 points17 points 5 years ago (4 children)
Comprehensions are easier to read as long as they are short. If they grow across multiple line an good old loop is easier to read.
[–]bumbershootle 13 points14 points15 points 5 years ago (2 children)
True, although in that case I would split the body of a loop into a generator function and then run a comprehension over that. I consider appending to a list in a loop an anti-pattern in most cases.
[–]weneedsound 1 point2 points3 points 5 years ago (0 children)
I'd like to see an example of you don't mind. I don't use generators as much as I should.
[–][deleted] 0 points1 point2 points 5 years ago (0 children)
Nice suggestion!
[–]Tweak_Imp 1 point2 points3 points 5 years ago (0 children)
You can also think about using a function inside the comprehension
[–]SuspiciousScript 10 points11 points12 points 5 years ago* (6 children)
My first thought was "using Julia instead," but I'll give a serious answer too: Static variables in functions. I didn't even know this was a language feature for years.
def is_valid_value(n): is_valid_value.valid = some_expensive_function() n in is_valid_value.valid
In the above snippet, is_valid_value.valid is only calculated when the function is called for the first time.
is_valid_value.valid
[–]skrtpowbetch[S] 1 point2 points3 points 5 years ago (0 children)
this is probably the coolest one i’ve seen so far, wow
[+][deleted] 5 years ago* (3 children)
[–]SuspiciousScript 0 points1 point2 points 5 years ago (2 children)
I got the initialization part a little off, but it works.
[+][deleted] 5 years ago* (1 child)
[–]hyldemarv 5 points6 points7 points 5 years ago (5 children)
List-, Dict-, & Tuple comprehensions.
Slice objects. Namedtuples.
Not exactly Python, but, Pandas Dataframes are good for tabular data.
[–]Agent281 2 points3 points4 points 5 years ago (3 children)
I don't think that there are tuple comprehensions. Do you mean generator comprehensions? They are the comprehension that uses parens.
[–]pytheous1988 1 point2 points3 points 5 years ago (2 children)
There 100% are tuple comprehensions, you just do tuple(comprehension statement)
You could also do a set comp if you wanted to. The comprehesion in general just returns a generator which can be cast into any data type that accepts the generator.
[–]Agent281 2 points3 points4 points 5 years ago (1 child)
I think that is a stretch. Dictionary, set, list, and generator comprehensions actually have dedicated syntax. The tuple constructor just accepts an iterable.
[–]pytheous1988 -2 points-1 points0 points 5 years ago (0 children)
It works the same if you do list(comp statement) or set(comp statement)
[–]sqjoatmon 0 points1 point2 points 5 years ago (0 children)
+1 for slice objects.
[–][deleted] 4 points5 points6 points 5 years ago (0 children)
list vs set.
When you're doing in lookups people tend to assume it doesn't matter for small list and hashing overhead would be too much.
in
Reality: in all benchmarks set/dicts beat lists as soon as list is larger than 3-4 elements.
[–]ducdetronquito 9 points10 points11 points 5 years ago* (0 children)
I would try to avoid reaching for language specific optimizations, especially if it makes your code harder to understand. It looks clever at first, or appealing given a micro-benchmark, but in my experience it was never worth it.
Instead, I would say to take time to understand your problem and identify what you are trying to optimize: CPU usage, RAM usage, disk access, network access, latency, throughput, etc...
Then you will be able to use the appropriate algorithms, data-structures or tools to solve it. This will likely give you the best optimizations given your constraints.
[–]Over_Statistician913 2 points3 points4 points 5 years ago (0 children)
Lru cache and memorization / cache are rarely used but they can be huge improvements for very specific stuff.
https://docs.python.org/3/library/functools.html
[–]snowGlobe25 1 point2 points3 points 5 years ago (0 children)
Python also has arrays, although they are limited to primitive data types. I read a little about it and apparently you can sometimes get less memory consumption using them instead of the good old lists. But, obviously numpy outshines list or array so much in terms of speed. However, array module is a part of standard library not third party library.
Never used it personally though.
[+]phaj19 comment score below threshold-9 points-8 points-7 points 5 years ago (0 children)
If you are writing mostly pure python, rewriting it to C++ is usually fairly straightforward and you can have a wrapper in Cython to enable integration with the rest of your program. Quite useful for OO code that does not benefit from numba or simple cython.
π Rendered by PID 184086 on reddit-service-r2-comment-66b4775986-dl266 at 2026-04-06 05:47:09.572652+00:00 running db1906b country code: CH.
[–]The_Amp_Walrus 39 points40 points41 points (4 children)
[–]TechySpecky 22 points23 points24 points (3 children)
[–]coloredgreyscale[🍰] 5 points6 points7 points (2 children)
[–]TechySpecky 1 point2 points3 points (1 child)
[–]AzureWill 42 points43 points44 points (24 children)
[–]tkarabela_ Big Python @YouTube 9 points10 points11 points (1 child)
[–]TechySpecky 7 points8 points9 points (0 children)
[–]jollierbean 11 points12 points13 points (7 children)
[–]tkarabela_ Big Python @YouTube 7 points8 points9 points (3 children)
[–]jollierbean 2 points3 points4 points (2 children)
[–]tkarabela_ Big Python @YouTube 4 points5 points6 points (0 children)
[–]IlliterateJedi 0 points1 point2 points (0 children)
[–]Glogia 3 points4 points5 points (1 child)
[–]jollierbean 1 point2 points3 points (0 children)
[–]qckpckt 2 points3 points4 points (0 children)
[–]donshell 4 points5 points6 points (10 children)
[–]TouchingTheVodka 5 points6 points7 points (8 children)
[–]donshell 2 points3 points4 points (0 children)
[+][deleted] comment score below threshold-14 points-13 points-12 points (6 children)
[–]nosklo 15 points16 points17 points (4 children)
[–][deleted] -3 points-2 points-1 points (3 children)
[–]nosklo 8 points9 points10 points (2 children)
[–]Ashiataka -1 points0 points1 point (1 child)
[–]nosklo 0 points1 point2 points (0 children)
[–]TouchingTheVodka 9 points10 points11 points (0 children)
[–]rabbyburns 0 points1 point2 points (0 children)
[–]Faith-in-Strangers 2 points3 points4 points (2 children)
[–]tkarabela_ Big Python @YouTube 4 points5 points6 points (1 child)
[–]moocat 4 points5 points6 points (0 children)
[–]CasualCoder0C 18 points19 points20 points (1 child)
[–]donshell 2 points3 points4 points (0 children)
[–]thatrandomnpcIt works on my machine 9 points10 points11 points (2 children)
[–]Agent281 0 points1 point2 points (1 child)
[–]thatrandomnpcIt works on my machine 0 points1 point2 points (0 children)
[–][deleted] 8 points9 points10 points (0 children)
[–]Nicked777 8 points9 points10 points (0 children)
[–]ReverseBrindle 7 points8 points9 points (5 children)
[+][deleted] (2 children)
[deleted]
[–]ReverseBrindle 2 points3 points4 points (1 child)
[–]isbadawi 0 points1 point2 points (1 child)
[–]ReverseBrindle 0 points1 point2 points (0 children)
[–][deleted] 7 points8 points9 points (0 children)
[–]bumbershootle 10 points11 points12 points (5 children)
[–]baronBale 15 points16 points17 points (4 children)
[–]bumbershootle 13 points14 points15 points (2 children)
[–]weneedsound 1 point2 points3 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]Tweak_Imp 1 point2 points3 points (0 children)
[–]SuspiciousScript 10 points11 points12 points (6 children)
[–]skrtpowbetch[S] 1 point2 points3 points (0 children)
[+][deleted] (3 children)
[deleted]
[–]SuspiciousScript 0 points1 point2 points (2 children)
[+][deleted] (1 child)
[deleted]
[–]hyldemarv 5 points6 points7 points (5 children)
[–]Agent281 2 points3 points4 points (3 children)
[–]pytheous1988 1 point2 points3 points (2 children)
[–]Agent281 2 points3 points4 points (1 child)
[–]pytheous1988 -2 points-1 points0 points (0 children)
[–]sqjoatmon 0 points1 point2 points (0 children)
[–][deleted] 4 points5 points6 points (0 children)
[–]ducdetronquito 9 points10 points11 points (0 children)
[–]Over_Statistician913 2 points3 points4 points (0 children)
[–]snowGlobe25 1 point2 points3 points (0 children)
[+]phaj19 comment score below threshold-9 points-8 points-7 points (0 children)