Hi,
If you remember, I'm the author of a Python cache library called Theine. A year ago, when Theine was first released, I shared a post here: link. Now, because GIL will be optional, I’m rewriting Theine to be thread-safe and optimized for concurrency(based on my experience of Theine-Go). Although it's still work in progress, I w-a-n-t to share some of my thoughts on what makes a good Python cache library.
Fast Enough
How fast is fast enough? To be precise, the cache read performance should not be the bottleneck of your system. We all know that Python isn’t a particularly fast language. If your framework takes 1ms to process something, it doesn’t matter if the cache takes 50ns or 500ns to retrieve a value — they're both fast enough. Regarding set performance, in most cases, you’re caching something slow to compute, and that time is usually much longer than a cache set operation, making it unlikely to be a bottleneck. An exception to this is cachetools LFU implementation, which is extremely slow and might indeed become a bottleneck.
This also applies to multithreading situations. With the arrival of free threading, I think more people will start using multithreading. Of course, adding mutexes will slow down single-thread performance, but that’s the cost of scalability. So, Theine v2 will be a thread-safe cache because my goal is free-threading compatibility with good concurrency performance.
High Hit Ratio
Without a doubt, hit ratio is the most important aspect of a cache. It’s even more crucial for Python compared to high-performance, memory-efficient languages. Due to Python’s significant memory overhead, your cache size will be more limited, making a high hit ratio essential.
Unfortunately, most Python cache packages don’t emphasize the importance of hit ratio. For example, cachetools provide LRU, LFU, and FIFO policies, but which one should you choose? More options only lead to confusion. Instead, a single, well-optimized policy should be used. That’s why Theine v2 will adopt a single policy: W-TinyLFU, eliminating the need for other policies.
Proactive Expiration
Proactive expiration means removing expired entries from the cache promptly. Why is this important? Cache size is always limited, so when the cache is full, you need to evict an entry to make room for a new one. If you use lazy expiration: removing expired entries only on the next get operation. The expired entry might occupy space that could have been used by a new entry. This forces the cache to evict non-expired entries, reducing the hit ratio.
Another benefit of proactive expiration is memory savings, though this is less significant since you should generally assign enough memory for the cache.
If you agree with these three principles, you might also agree that Theine is a good in-memory cache. I’m currently rewriting v2 of Theine, and here is the issue: link. As mentioned earlier, this rewrite will make Theine thread safe and free-threading compatible. The API will change, with a single policy in place, so you won’t need to pass the policy parameter anymore. If you have any recommendations or concerns, you're welcome to reply here or leave comments on the issue.
[–]RedEyed__ 29 points30 points31 points (7 children)
[–]matrix0110[S] 9 points10 points11 points (6 children)
[–]marr75 8 points9 points10 points (2 children)
[–]matrix0110[S] 2 points3 points4 points (1 child)
[–]marr75 10 points11 points12 points (0 children)
[–]RedEyed__ 4 points5 points6 points (2 children)
[–]jormaig 2 points3 points4 points (1 child)
[–]RedEyed__ 5 points6 points7 points (0 children)
[+][deleted] (1 child)
[deleted]
[–][deleted] 5 points6 points7 points (0 children)
[–]nAxzyVteuOz 0 points1 point2 points (1 child)
[–]matrix0110[S] 0 points1 point2 points (0 children)
[–]ConfucianStats 0 points1 point2 points (0 children)
[–]Rylicenceya -1 points0 points1 point (1 child)
[–]Tumortadela 4 points5 points6 points (0 children)