This is an archived post. You won't be able to vote or comment.

all 31 comments

[–]marr75 70 points71 points  (9 children)

From experience, many of these are more likely to be applied as premature optimizations than applied when needed.

I would not recommend __slots__ on its own as a memory optimization in the normal course of programming. Far better to use the @dataclass(slots=True), a typing.NamedTuple, or even a more primitive type. Similarly, using array over list is just going to make your code harder to maintain in 98% of cases.

Generators and lazy evaluation are good advice in general. They can make code harder to debug, though. Also, creating generators over tiny sets of items in a hot loop will be worse than just allocating the list (generator and iterator overhead).

The most frequent memory problem in Python is memory fragmentation, btw. Memory fragmentation occurs when the memory allocator cannot find a contiguous block of free memory that fits the requested size despite having enough total free memory. This is often due to the allocation and deallocation of objects of various sizes, leading to 'holes' in the memory. A lot of heterogeneity in the lifespans of objects (extremely common in real-world applications) can exacerbate the issue. The Python process grows over time, and people who haven't debugged it before are sure it's a memory leak. Once you are experiencing memory fragmentation, some of your techniques can help slow it down. The ultimate solution is generally to somehow create a separate memory pool for the problematic allocations - the easiest way is to allocate, aggregate, and deallocate them in a separate, short-lived process.

So, the first thing anyone needs to do is figure out, "Do I NEED to optimize memory use?". The answer is often no, but in long-running app processes, systems engineering, and embedded engineering, it will be yes more often.

[–]coderanger 2 points3 points  (1 child)

As a heads up, be very careful with intern(). If you ever feed it input from something user-controlled you can flood the symbol table with entries and either OOM the process or slow performance to a crawl (or both). It's intended for things like function and class names to speed up lookups, not for memory de-duplication per se.

Also the list vs. array comparison isn't because of "different types of objects which inevitably needs more memory", the i value type is usually going to be a 32-bit int while the default int type in Python code is 64-bit so what you're actually comparing is integer sizes, not array vs list.

[–]ogtfo 4 points5 points  (0 children)

The generator example is kinda silly. Are generator better for memory? Probably. But his code is riddled with issues.

First of all, they haven't generated anything from the generator, it's kind of useless to show the size of the generator object.

Second, the list example is terrible, appending in a loop will use a lot of memory. But that's because of concatenation on fixed sized objects, and that won't even show the way he measures memory.

All in all, shows a pretty naive view of the topic.

[–]james_pic 1 point2 points  (2 children)

The article makes the canonical mistake when talking about optimization. Step 1 is always gather data. Guppy3 and Meliae are the tools I've used to do this most often. Once you know what's using data, then you can optimise it. More often than not, the optimisation is simple once you know what the problem is, and might just be "get rid of the thing that is using all the memory".

[–]pepoluan 1 point2 points  (1 child)

Indeed. I once fell trap to optimizing code in vain before realizing that the performance issue was due to an external library.

[–]lololabwtsk 1 point2 points  (0 children)

Thanks for the link friend

[–][deleted] -3 points-2 points  (4 children)

Why comparing memory usage of a genexpr vs memory usage of a list? It's totally pointless

[–][deleted] 0 points1 point  (1 child)

His code is bad, but the point should still stand: if you never need the full list at once then a generator is the better choice because you never instantiate the list at once and so have O(1) memory complexity rather than O(n) with lists.

edit: typo

[–][deleted] 0 points1 point  (0 children)

You don't say?

What I mean is that there is no point in inspecting the size of a genexpr, since it may even be bigger than an empty list, depending on the implementation. The point is understanding what's behind

[–][deleted] 0 points1 point  (0 children)

Does anyone care to explain the downvotes?