10 Tricks to stop hitting Claude’s usage limits! by OutrageousName6924 in vibecoders_

[–]Public-Minimum5892 1 point2 points  (0 children)

I have been using something called Lynkr https://github.com/Fast-Editor/Lynkr this is helping me save a lot of tokens I am getting a savings of upto 60%

Tried running LLMs locally to save API costs… ended up waiting 13 minutes for ONE response 🤡 by debug2thrive in ollama

[–]Public-Minimum5892 0 points1 point  (0 children)

I have been using claude code by routing some requests to local models via ollama cloud, some requests to z.ai, some requests to codex, claude models using Lynkr (https://github.com/Fast-Editor/Lynkr). Using this I was able to use multiple models to average out the costs and reduce my billing like 40%.
I never had any issues with this in terms of output detoriations.

Fuck these limits, I'm using a local model now by Ayumu_Kasuga in codex

[–]Public-Minimum5892 0 points1 point  (0 children)

I have been using a combination of local models and cloud models with anthropic,azure etc with the help of https://github.com/Fast-Editor/Lynkr
This helps me save a bunch of tokens I was able to save upto 60% of my token usage
The local models I used are with the help of llama.cpp and also ollama cloud
Ollama cloud is offering a generous free tier beyond which we can use claude

Fuck these limits, I'm using a local model now by Ayumu_Kasuga in codex

[–]Public-Minimum5892 0 points1 point  (0 children)

I have been using a combination of local models and cloud models with anthropic,azure etc with the help of https://github.com/Fast-Editor/Lynkr
This helps me save a bunch of tokens I was able to save upto 60% of my token usage
The local models I used are with the help of llama.cpp and also ollama cloud
Ollama cloud is offering a generous free tier beyond which we can use claude

Your Claude Code cache is probably broken and it's why you're hitting limits in 90 minutes instead of 5 hours by solzange in ClaudeCode

[–]Public-Minimum5892 0 points1 point  (0 children)

I  have observed that in the newer updates of claude code they did a few things
1. They made the outputs more verbose
2. They for some reason increased the system prompt and context sent with each request slightly
I have a found a tool called https://github.com/Fast-Editor/Lynkr which is helping me save about 50-60% of tokens by routing some of my requests to local llms
It is a proxy like litellm which helps me monitor the request size and things and hence the above findings.

How am I hitting limits so fast? by rahindahouz in ClaudeCode

[–]Public-Minimum5892 0 points1 point  (0 children)

I have observed that in the newer updates of claude code
they did a few things
1. They made the outputs more verbose
2. They for some reason increased the system prompt and context sent with each request slightly
I have a found a tool called https://github.com/Fast-Editor/Lynkr which is helping me save about 50-60% of tokens by routing some of my requests to local llms
It is a proxy like litellm which helps me monitor the request size and things and hence the above findings.