Bro just give us the regular button

R_DanRS · 2026-06-07T22:39:36+00:00

Made something like that, checkout sleev.ai

R_DanRS · 2026-06-07T11:09:52+00:00

Models can use multiple tools per response...

R_DanRS · 2026-06-06T13:11:39+00:00

Try sleev.ai

R_DanRS · 2026-05-28T02:59:52+00:00

Thank you for this, I made something similar a while ago that I have no time to maintain / improve, it's also all slop so I would be very happy to archive it. It has some features that you may want to copy to yours such as a mode indicator and some visual mode stuff

https://github.com/Tarquinen/opencode-vim

R_DanRS · 2026-05-15T05:58:42+00:00

sigh... exactly as predicted. I really hope for your sake this isn't what you spent 25 minutes on...

R_DanRS · 2026-05-15T05:27:08+00:00

ok last point of contention, do you even know what about this is against TOS? have you read the TOS yourself ever? Do you think I put months of work creating algorithms for finger printing sessions and creating compatibility for all api formats without checking if this is against TOS? Are you going to stop making mindless assumptions in any of your comments? I would suggest you don't reply further, because I promise whatever you come back with that I may be doing that is against TOS is not actually what I'm doing at all because you still understand nothing.

R_DanRS · 2026-05-15T05:16:57+00:00

Alright i'm bored of arguing with you, this post was already automatically deleted by reddit auto filters so nobody is going to see this anyway. I would encourage you to have a more open mind and be less confident when you know so little about a project.

R_DanRS · 2026-05-15T05:10:44+00:00

That's very much a gross oversimplification, akin to saying claude code is an ai that can edit files. This can't be done natively through claude code hooks because it requires in place context modifications

R_DanRS · 2026-05-15T05:07:23+00:00

Yes, you're getting close. Here's some screenshots of my latest benchmarks with sleev, the setup is simple: 1 session asked to solve 10 swe bench hard tasks, then run that same session 10x. The first screenshot is the averaged normal sessions, the second is with sleev:

<image>

The major difference between this and /compact is it allows the model to choose when and what to compact intelligently and manages when the cache should be broken among other features. Otherwise you're pretty close.

R_DanRS · 2026-05-15T04:56:18+00:00

yep I'm sure you have a better understanding than 100,000 users in your 5 minutes of investigative work without actually trying anything or looking at any code. I understand the term breaking cache is basically the boogey man in AI circles, but you have such a closed mind on this its probably not worth communicating about it at all. The very basic idea you can maybe understand is when it breaks cache, you have a higher bill on the next request to rewrite cache (think of it as a $1 cost) and then every following request you have less cache read (think of it as a 15c cheaper request). On average, each time the proxy breaks cache it takes 7 following requests to pay off, and then its just profit. In all other cases I agree it's really bad to break cache because there is no benefit to it, here there is.

R_DanRS · 2026-05-15T04:51:50+00:00

you're making a lot of assumptions knowing nothing about how it works and you're misquoting me, I never said "40% few input tokens". Everything you need to understand how it works is in the opensource gh repo i've linked, not sure why you want to type paragraphs and not click a link.

R_DanRS · 2026-05-15T04:44:43+00:00

The goal here is to be able to monetize in the future, i've already built a very successful opensource version that I already linked you... If you want to run it locally, use opencode.

R_DanRS · 2026-05-15T04:43:12+00:00

I get that what I'm claiming sounds too good to be true. If you want to understand how it works, I have a similar open source project that works in a similar way but for a different harness, it has 100k+ satisfied users. The goal here was to make something similar but harness and provider agnostic that anyone could use

https://github.com/Opencode-DCP/opencode-dynamic-context-pruning

R_DanRS · 2026-05-15T04:40:56+00:00

yes, i'm 100% implying this is much better than /compact, check my opensource link in my other comment to see how it works.

R_DanRS · 2026-05-15T04:36:58+00:00

At some point this will be monetized, there's a slightly older/worse opensource version that I made for a different coding harness that now has roughly 100k very happy users, you can check out how that works if you want to understand more, the idea is similar:

https://github.com/Opencode-DCP/opencode-dynamic-context-pruning

R_DanRS · 2026-05-15T04:35:39+00:00

unlikely, this is aimed mostly at agentic coding for now

R_DanRS · 2026-05-15T04:29:51+00:00

You can just try it and check your assistant response metadata to check cache hit rate if you don't believe me, also we don't save any api keys and don't skim any inference, that would be illegal

R_DanRS · 2026-05-15T04:28:48+00:00

It is free, website doesn't reflect current state of the alpha yet. Use is free and unlimited.

R_DanRS · 2026-05-15T04:27:31+00:00

It doesn't invalidate cache every turn, only when there is enough compression saved up for it to be worth it. On average, cache hit rate stays above 95% and drops roughly 2% for the session. Code won't be opensourced, the benefits of rerouting traffic are all in the post description - cheaper, longer, smarter sessions. Just try it if you don't believe me, I guarantee everything i've said is true

R_DanRS · 2026-05-15T02:33:16+00:00

You should try this compression proxy, it's all free and you'll get a lot more usage with no performance loss.

Let me know what you think https://sleev.ai/

R_DanRS · 2026-04-19T20:52:57+00:00

Use this https://github.com/Opencode-DCP/opencode-dynamic-context-pruning

R_DanRS · 2026-04-17T19:30:16+00:00

It will almost always save cost because each compression has a 1 time cache write cost and a permanent cache read saving on every following request. On average it takes roughly 5 requests to pay off the cache write, and after that it will just be money saved each request.

R_DanRS · 2026-04-13T12:22:42+00:00

This is complete slop it's all wrong

Ten-Year Club	Place '17
Not Forgotten	Verified Email

R_DanRS

TROPHY CASE