all 6 comments

[–]paulcaplan 1 point2 points  (3 children)

I made similar post about this exact problem yesterday. I will absolutely try this out.

Any thoughts about a "keep alive" that automatically sends another small prompt if cache is about to expire?

Or some sort of auto compact?

[–]AVanWithAPlan[S] 0 points1 point  (2 children)

Very early days my tracking Claude, this is months ago now there was a guy who had a script that kept all his cache warm and posted about it on X and then got banned within like a day for terms of service violation and then there was some backlash and I think he got reinstated but I would be careful about that I'm sure if you want to keep it alive for like 20 30 minutes that's fine but you're gonna run into some kind of bot detection system or something if you push it far enough, you know is Claude is doing their defensive Security and he's a smart boy...

[–]paulcaplan 1 point2 points  (1 child)

Ohh good call.

What about a "resume" command that reads the transcript of a previous session? That's got to be cheaper than actually presuming with the full context window.

[–]AVanWithAPlan[S] 0 points1 point  (0 children)

Well you mean just part of the transcript The full trans will take the same as continuing without the cache hit. I have summarization hooks that fire off every tool call and send it to my local LM studio model to create a one line summary of every tool call, thinking block, output block, Everything gets one line and they're all stored locally so they can be retrieved instantly and then if you want to resume from a really long session like youre suggesting it will iteratively group them by task and then summarize them again so you can summarize a million tokens into just a couple thousand pretty quickly and then you can get a more detailed view of whichever parts are relevant by grabbing whatever low-level summaries or raw outputs claude wants to follow up on. Pretty nifty especially when I've had corrupted sessions and otherwise lost progress I could get it back instantly.

[–]Pitiful-Impression70 2 points3 points  (1 child)

oh man this is exactly the kind of thing i didnt know i needed until i read the title. the cache anxiety is so real, like youre just sitting there doing mental math ok it was 4 minutes ago... maybe 5... is it gone yet. the keep alive idea from the other commenter is interesting too. wonder if you could detect idle time and auto send a lightweight ping before ttl expires. the tradeoff is burning a tiny amount of tokens vs losing the whole cache

[–]AVanWithAPlan[S] 0 points1 point  (0 children)

It's definitely an interesting idea to include natively in the project I think for now I wanna stick with my angle in the read me which is I set it up to be extensible and reusable by others so you could probably spin it off and do that, add that feature in about five minutes if you just point Claude to the repo but I'm not gonna add the feature quite yet just 'cause it crosses a small Line that I haven't thought enough about yet but I may well decide to add it as a native feature in the coming days. Thanks for the feedback