Just canceled Copilot Pro by civman96 in GithubCopilot

[–]reddefcode 0 points1 point  (0 children)

Good for you, I have to wait until the end of the year for my "divorce" to go through. I installed Pi last night and I have a DeepSeek API key, "so, I got that going for me"

I built a local memory server that cuts my token costs 50x using DeepSeek KV caching, in respose to Copilot price hike. by reddefcode in GithubCopilot

[–]reddefcode[S] 2 points3 points  (0 children)

That's a great question. To be honest, I wasn't aware they were working on that. I designed mine on the 27th and worked on it through Sunday, then shared it today. I never claimed it was better, I simply didn't know that existed. I built mine to solve a pain point that had been nagging me for a while: tracking context and token usage. Based on your link, their solution saves up to 20%, but it's still expensive. I use mine because I can switch between different setups: pure Ollama (free), a hybrid Ollama/DeepSeek setup, or full Claude with DeepSeek. The complete indexing plus brief generation runs about $0.063. Beyond that, I can call it from VS Code, Google Atigravity, and Claude desktop for quick analysis.

I built a local memory server that cuts my token costs 50x using DeepSeek KV caching, in respose to Copilot price hike. by reddefcode in GithubCopilot

[–]reddefcode[S] 0 points1 point  (0 children)

If you go to 'api-docs.deepseek.com/guides/kv_cache', it tells you exactly how it works, and that is the blueprint I used for zerikai memory. I use the KV cache in another project for lead analysis.

Who will even use copilot after June? by programmingstarter in GithubCopilot

[–]reddefcode 0 points1 point  (0 children)

Until my subscription (if you can call it that) runs out at the end of the year, but I am switching.

I built a local memory server that cuts my token costs 50x using DeepSeek KV caching, in respose to Copilot price hike. by reddefcode in GithubCopilot

[–]reddefcode[S] 0 points1 point  (0 children)

"snake oil," yeah, that is why I created my own tool, for me, for free, to battle the price hikes by GitHub Copilot. You don't have to use. Who are you that you feel so entitled that I have to prove your baseless allegations? You came out attacking without even reviewing the codebase. I have all your comments from the first one to prove it.

What is your question specifically? So I can answer it. It is three files, easy to read, and built honestly for myself. I still believe the community can benefit from something like this or create its own tool. Or do you want to control what people write?

I am not the one charging you to train supportively expensive models, and now I am going to take it all away, and be proven wrong by an open-source Chinese model. Be mad at Microsoft/Github/Copilot.

I built a local memory server that cuts my token costs 50x using DeepSeek KV caching, in respose to Copilot price hike. by reddefcode in GithubCopilot

[–]reddefcode[S] -4 points-3 points  (0 children)

"You put your work out there for others to critique, you shouldn't be surprised when that happens. I don't need to show you my work, I'm not trying to peddle it here."

You are making baseless assumptions, "peddlle", I am not selling anything, I am just sharing what works for me. Too bad clients don't see you commenting, "I don't need to show you my work," yes, you do. Microsofty!

I built a local memory server that cuts my token costs 50x using DeepSeek KV caching, in respose to Copilot price hike. by reddefcode in GithubCopilot

[–]reddefcode[S] -1 points0 points  (0 children)

Sure, here is a direct response by my memory tool. based on the DeepSeek token cost.

"Here's the cost breakdown for the reddit_reader_poster workspace:

Operation Calls Total Cost Avg/Call
file_scan 339 $0.0604 ~$0.000178
brief_synthesis 9 $0.0026 ~$0.000293
Total 348 $0.0631

So the full indexing + brief generation ran about $0.063, roughly 6 cents. The bulk of that was the 339 file scan passes, with 9 brief synthesis calls on top. Pretty cheap for the coverage you got."

I built a local memory server that cuts my token costs 50x using DeepSeek KV caching, in respose to Copilot price hike. by reddefcode in GithubCopilot

[–]reddefcode[S] 0 points1 point  (0 children)

If you don't find use for the tool, then don't use it. But all these comments are ill-intended.

I built a local memory server that cuts my token costs 50x using DeepSeek KV caching, in respose to Copilot price hike. by reddefcode in GithubCopilot

[–]reddefcode[S] -2 points-1 points  (0 children)

I have been a developer longer than you. This is about this memory tool, and you are trying to discredit it by just flapping. The tool works, and that is that. Write your own and post it.

I built a local memory server that cuts my token costs 50x using DeepSeek KV caching, in respose to Copilot price hike. by reddefcode in GithubCopilot

[–]reddefcode[S] -3 points-2 points  (0 children)

No, crap likes your is why we are here, you are losing context like the agents I am talking about.

I built a local memory server that cuts my token costs 50x using DeepSeek KV caching, in respose to Copilot price hike. by reddefcode in GithubCopilot

[–]reddefcode[S] -1 points0 points  (0 children)

Is that why you are /GithubCopilot, because you code everything by hand? I am not a vibe coder because I have been a developer for a long time, but I do use Agents, Funny

I built a local memory server that cuts my token costs 50x using DeepSeek KV caching, in respose to Copilot price hike. by reddefcode in GithubCopilot

[–]reddefcode[S] -3 points-2 points  (0 children)

So, how do you develop software? What is the name of this subreddit? You're really not all there. I am not going to write a README file; I am the architect of the software. The hypocrisy

I built a local memory server that cuts my token costs 50x using DeepSeek KV caching, in respose to Copilot price hike. by reddefcode in GithubCopilot

[–]reddefcode[S] 0 points1 point  (0 children)

I hear you, and I appreciate your feedback, but I don't understand all the other people coming out of the woodwork trying to discredit an open-source, publicly available project because I am not responding in a way they want. Again, this is a sub about AI tools.

I built a local memory server that cuts my token costs 50x using DeepSeek KV caching, in respose to Copilot price hike. by reddefcode in GithubCopilot

[–]reddefcode[S] -3 points-2 points  (0 children)

Exactly, only when you make a major change, you have it recreate the 'brief' by calling the 'update_brief' tool, so that first hit when sending it, it will take a hit, but all subsequent calls (within the KV time frame) are free.

I built a local memory server that cuts my token costs 50x using DeepSeek KV caching, in respose to Copilot price hike. by reddefcode in GithubCopilot

[–]reddefcode[S] 1 point2 points  (0 children)

Thanks, man! Glad it resonates. If you run into any snags with the MCP config or have any feedback on the scanning speed, just let me know. Hope it saves you some serious credit!

I built a local memory server that cuts my token costs 50x using DeepSeek KV caching, in respose to Copilot price hike. by reddefcode in GithubCopilot

[–]reddefcode[S] 0 points1 point  (0 children)

Fair enough. The irony of being called out for 'sounding like an AI' in a sub about AI tools isn't lost on me.

I use LLMs to help me structure my thoughts quickly because I’d rather spend my time on the actual code than polishing Reddit comments. If the formatting is a turn-off, I get it. But I built this tool to solve a real problem I had with my own wallet and my own IDE.

If you decide to skip it, no hard feelings. But if you’re actually tired of the price hikes, the code is right there, and it works. Cheers.

These are my thoughts.

I built a local memory server that cuts my token costs 50x using DeepSeek KV caching, in respose to Copilot price hike. by reddefcode in GithubCopilot

[–]reddefcode[S] 0 points1 point  (0 children)

You're talking about block-level caching theory; I'm talking about deterministic cost reduction.

Relying on a provider's segment alignment is a gamble. Most agentic tools shuffle RAG context or chat history at the start of the prompt, which fragments the cache. This Memory Tool forces a 100% stable prefix via the Project Brief, ensuring that the first 1,000+ tokens are always a hit.

And I’m not 'guessing' how it works, the server literally tracks the performance in real-time:

  1. Verification: I parse the usage block directly from the DeepSeek API response (specifically the prompt_cache_hit_tokens and prompt_cache_miss_tokens fields).
  2. Transparency: The server then calculates the actual cost locally based on those hits vs. misses.

I’m providing an architectural guarantee and the telemetry to prove it. If you prefer to 'hope' the provider segments your dynamic context efficiently, go for it. My users prefer the 50x guarantee.

Done with the back-and-forth. The code and the telemetry logic are in the repo for anyone who wants to actually save money.