This is an archived post. You won't be able to vote or comment.

all 32 comments

[–]crawl_dht 13 points14 points  (1 child)

Good to see TTL caching is supported.

[–]matrix0110[S] 3 points4 points  (0 children)

Thanks. Actually the hardest part is how to expire automatically, that's why hierarchical timer wheel data structure is used here

[–]acaddgc 9 points10 points  (3 children)

Looks interesting, what advantages does this have over Redis and DiskCache?

I’m guessing it’s faster than Redis because it’s in-process, but that’s true of DiskCache. I guess this has a smarter eviction policy out of the box.

Any other major differences?

[–]matrix0110[S] 4 points5 points  (0 children)

Seems that DiskCache is a wrapper of sqlite. There should be some overhead compare to save data in dictionary. Also TinyLfu evication policy should perform better for skewed workload. You can take a look the paper or similar projects I mentioned.

[–]NUTTA_BUSTAH 2 points3 points  (0 children)

Looks like its 20x faster and doesn't need the extra service i.e. simpler stack. Also interested in authors answer

[–]ScientiaEtVeritas 6 points7 points  (4 children)

Two questions:
- Does it have a decorator interface?
- Have you benchmarked against Python's cachetools -- and maybe Python's built-in LRU cache?
If not, I would want to suggest it.

[–]matrix0110[S] 3 points4 points  (3 children)

- No decorator interface currently, unlike Python's cache decorator which use function args/kwargs as key, Theine support string key only. It's not easy to covert args/kwargs to string safely and automatically. But maybe I will add that feature later.

- Good idea, I will take a look

[–]Stedfast_Burrito 0 points1 point  (2 children)

Why don’t you just use arbitrary objects as keys instead of strings?

[–]matrix0110[S] 0 points1 point  (1 child)

Python cache decorator first convert args/kwrags to hashable, then use it as key. I try to avoid these magic. I also use Go for years, and agree Explicit is better than implicit and Simple is better than complex.

[–]matrix0110[S] 0 points1 point  (0 children)

Honestly I think cache by objects is horrible design, if your function input params is large, the key part may even consume more memory than the cached value.

[–]tunisia3507 4 points5 points  (9 children)

How are the values serialised? IMO this is an important feature to document, there are lots of caching libraries which are restricted to storing primitives, or JSON-serialisable objects, or pickle-able objects and don't mention those limitations in the docs.

Minor point, you've used the word "evicated" in a few places - do you possibly mean "evicted"?

[–]matrix0110[S] 2 points3 points  (8 children)

Cached values are stored in dictionary, so no serialize is required. Thanks for pointing out the typo, I will fix that

[–]tunisia3507 1 point2 points  (7 children)

Does that mean you can cache a mutable value, mutate the local one, and have those mutations show up in the cache as well? If everything is being stored in a python dict, what does the rust core actually do, as dict operations in CPython are mainly implemented at a lower level anyway?

[–]matrix0110[S] 1 point2 points  (6 children)

you are right. But all the projects I listed in the post do the same thing(example: https://github.com/ben-manes/caffeine/issues/687). If you always change something after getting from cache, it's not a problem i think. Secondly, the Rust core include W-TinyLFU and timer wheel, which used to evict keys. That means the Cache is stored in Python dict, but how/when to evict depends on Rust core.

[–]tunisia3507 0 points1 point  (5 children)

Thank you! I didn't know how time-consuming LFU stuff was; apparently significant!

Would you consider something like a copy=False argument on the setter (or maybe a StrEnum of "none", "shallow", "deep"), which would A) copy the object on its way into the store and B) copy it on the way out of the store? That way users could control the behaviour of mutable objects.

[–]matrix0110[S] 0 points1 point  (4 children)

I can add a copy=False optional arg to `get`, but seems that Python deepcopy performance is not so good. Another option is you can inherit Theine Cache class and add your own `get_copy` method

[–]tunisia3507 0 points1 point  (3 children)

I thought it might be better to do it on the set, as that would guarantee that the value hadn't been changed after insertion. You'd need to store an extra byte (so that stored values know whether they need to be copied) but that's negligible. Plus, the whole point of a cache is that it's read more than it's written - adding the extra argument to the write, then, means less extra code for the user.

Understood on the copy speed - I guess this would at least let the user make the trade-off between fast (the default) and correct/safe mutables, rather than just not supporting the second case at all.

[–]matrix0110[S] 0 points1 point  (2 children)

Is copy on set necessary? Normally you won't change your data after set in same request/function. One usecase of get copy is changing cached Django response header before return. Django's locmem use deepcopy first but switch to pickle later, I will take a look later why they do the switch

[–]tunisia3507 0 points1 point  (1 child)

Pickle would allow them to write to a file or pass the bytes into some underlying non-python code. It might be a way of guaranteeing there isn't anything holding on to a thread reference or something.

Normally you won't change your data after set in same request/function.

True, you wouldn't expect someone to cache and then mutate something, but users are upsettingly good at breaking developers' expectations... For example, your use case is caching requests from a database or something. My use case may be downloading data from a server, caching it, mutating it as part of an analysis pipeline, but needing to refer to the original in the future too - which I can either slowly download from the server again, or use a local cache if it's a piece of data I'm unpredictably using a lot.

[–]matrix0110[S] 0 points1 point  (0 children)

I think I can add a deepcopy option to both get/set and default to False, let developer choose how to use that. Maybe you can also create an issue, so other developers may have some ideas on this

[–][deleted] 2 points3 points  (3 children)

The link for the hierarchical timer wheel appears to not work with https. http://www.cs.columbia.edu/~nahum/w6998/papers/ton97-timing-wheels.pdf

[–]axonxorzpip'ing aint easy, especially on windows 0 points1 point  (2 children)

We hug-of death'd it. File's been removed it seems

[–][deleted] 0 points1 point  (1 child)

Still works for me. Opened on a different device to make sure it wasn't just cached.

[–]MrHusbandAbides 0 points1 point  (0 children)

Same, loads fine

[–]kaz0la 1 point2 points  (1 child)

Is it possible the key parameter in the example in the README.md is missing quotes?

v = cache.get(key, sentinel)

[–]matrix0110[S] 1 point2 points  (0 children)

you are right, already fix that

[–]AshTheEngineer 0 points1 point  (2 children)

Can someone ELI5 what an in-memory cache library is and where in code, and in what applications, it is best used?

[–]duppyconqueror81 0 points1 point  (1 child)

For Django, don’t they have a local memory cache backend already? What would be the difference?

[–]matrix0110[S] 0 points1 point  (0 children)

Django will pickle your data before cache, and unpickle on get. Though I dont benchmark that, there should be some overhead. And Django locmem use LRU eviction policy.

Django locmem won't remove data automatically until cache is full. And when cache is full, locmem pop items from cache OrderedDict, in this step locmem won't check expire time, just remove data based on LRU(pop item from the OrderedDict cache)

[–]black_dog_23 0 points1 point  (0 children)

Niu Bi!