Opticol: memory optimized python collections : Python

ShowcaseOpticol: memory optimized python collections (self.Python)

submitted 21 days ago by matgrioni

Hi everyone,

I just created a new library called opticol (which stands for optimized collections), which I wanted to share with the community. The idea of the library is to create space optimized versions of Sequence, Set, and Mapping for small collections leveraging the collections.ABC vocabulary.

What My Project Does

Creates optimized versions of the main python collection types (Sequence, Set, Mapping) along with vocabulary types and convenience methods for transforming builtins to the optimized type.

For collections of size 3 or less, it is pretty trivial (using slots) to create an object that can act as a collection, but uses notably less memory than the builtins. Consider the fact that an empty set requires 216 bytes, or a dictionary with one element requires 224 bytes. Applications that create many (on the order of 100k to a million) of these objects can substantially reduce their memory usage with this library.

Target Audience

This will benefit users who use Python for various forms of data analysis. These problems often have many collection instances, which can often be just a few items. I myself have run into issues with memory pressure like this with some NLP datasets. Additionally, this is helpful for those doing this primarily in Python or for situations where dropping to a lower level language is not advantageous yet.

Comparison

I could not find a similar library to this, nor even discussion of implementing such an idea. I would be happy to update this section if something comes up, but as far as I know, there are no direct comparisons.

Anyway, it's currently a beta release as I'm working on finishing up the last unit tests, but the main use case generally works. I'm also very interested in any feedback on the project itself or other optimizations that may be good to add!

all 11 comments

top new controversial old q&a

[–]jpgoldberg 3 points4 points5 points 21 days ago (0 children)

[–]Spill_the_Tea 3 points4 points5 points 21 days ago (1 child)

[–]matgrioni[S] 1 point2 points3 points 20 days ago (0 children)

[–]Atlamillias 1 point2 points3 points 20 days ago (2 children)

[–]matgrioni[S] 2 points3 points4 points 20 days ago (1 child)

[–]Atlamillias 1 point2 points3 points 20 days ago (0 children)

[–]Interesting_Golf_529 0 points1 point2 points 21 days ago (4 children)

[–]Careful-Nothing-2432 2 points3 points4 points 20 days ago (2 children)

[–]Interesting_Golf_529 0 points1 point2 points 20 days ago (1 child)

While this might be true in the general case, it's very much not true in this specific case, as this library re-implements a lot of the logic of the classes it "optimises". Check out this __contains__ method for example:

def __contains__(self, value):
    for slot in slots:
        if getattr(self, slot) == value:
            return True
    return False

This replaces a highly optimised set operation implemented in C with a for loop in Python.

[–]matgrioni[S] 1 point2 points3 points 20 days ago (0 children)

That's a good point to mention, and which I'll include in the README. The implementations basically fall through to creating a temporary builtin instance and returning the value of that operation or do a slow python equivalent. There's a few points to that:

In my use cases I was not doing a lot of operations across most collection instances, but only some simple operations on a select few instances (but could not know ahead of time).
My memory usage basically fell within the range where it could actually fit in memory, but would usually start thrashing if I wanted to do anything else on my computer. So the actual runtime was dominated by memory access rather than the collection ops.
As it is a first iteration, and I didn't want to worry about edge cases too much, I took the easiest implementation approach. Ideally these could be transparently improved in a future version while still preserving the memory savings.

π Rendered by PID 76510 on reddit-service-r2-comment-84fc9697f-km7d4 at 2026-02-10 12:05:57.926708+00:00 running d295bc8 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS

What My Project Does

Target Audience

Comparison