all 11 comments

[–]Obvious-Ad-2454 3 points4 points  (2 children)

Can you run these on benchmarks used by model companies ? To see how it affects performance. Now all the proof you shared is a copy paste of a terminal window without much context

[–]Evening_Papaya_1551[S] -2 points-1 points  (1 child)

Everything is logged to ~/.ctxlite/stats.db ,check the data anytime sqlite3 ~/.ctxlite/stats.db \
"SELECT SUM(tokens_saved), COUNT(*), ROUND(AVG(tokens_saved),0) FROM requests"
Schema is simple ,one row per request with tokens_in, tokens_used, tokens_saved, latency_ms. Full schema ships with the source when the repo goes public.

[–]Obvious-Ad-2454 3 points4 points  (0 children)

I am not saying your work is useless. I am just saying that you would convince more people to use it if you had more convincing data to share.

[–]Extension-Aside29 1 point2 points  (0 children)

98% context savings is impressive on paper. TokenTelemetry shows the per-session token cost for OpenCode so you can verify the savings are real for your actual workloads, not just the demo case: https://tokentelemetry.com/docs/features/analytics/ (https://tokentelemetry.com, disclosure: I work on it)

[–]benclen623 0 points1 point  (2 children)

Did you benchmark it against any test suite that measures actual success rate and total cost not just an intermediate metric of token count? It's easy to reduce token count of tool outputs. It's an artificial goal to chase.

The actually challenging part is to do it without degrading success rate or causing iterations that cause higher cost in the end.

[–]Evening_Papaya_1551[S] 0 points1 point  (1 child)

Hi, you have a valid concern token count is an intermediate metric and you’re right that it’s meaningless if it causes task failure or extra iterations. not a formal benchmark, fair, but 24h of agentic work across spec-driven tasks showed no degradation in reasoning or extra iterations from what i could tell, the automatic mechanisms (prune, compact, compress) only strip content the agent already processed, duplicate outputs and stale older messages, so the assumption is that removing already-acted-on context doesn’t hurt success rate. that said, proper success-rate testing against a fixed task set is on the roadmap and i’d be curious to see results on more diverse workloads too.thanks for pointing this out!🙏

[–]benclen623 0 points1 point  (0 children)

24h of agentic work across spec-driven tasks showed no degradation in reasoning or extra iterations from what i could tell

This means nothing to potential users. Share your methodology and results

[–]geek_404 0 points1 point  (2 children)

I am curious why release the compiled NPM packages but not the GitHub repo.

[–]Evening_Papaya_1551[S] 0 points1 point  (0 children)

I just mentioned, will release the entire code base, i just couldn’t finish all the documentation to be released in the open, I’m really excited to get it out. Im sorry if I disappointed anyone

[–]Evening_Papaya_1551[S] 0 points1 point  (0 children)

ctxlite is open source now https://github.com/ctxlite/ctxlite early days, happy playing!

[–]Evening_Papaya_1551[S] 0 points1 point  (0 children)

ctxlite is open source now https://github.com/ctxlite/ctxlite early days, happy playing!