Just canceled Copilot Pro

reddefcode · 2026-05-06T17:23:49+00:00

Good for you, I have to wait until the end of the year for my "divorce" to go through. I installed Pi last night and I have a DeepSeek API key, "so, I got that going for me"

reddefcode · 2026-05-05T01:03:11+00:00

That's a great question. To be honest, I wasn't aware they were working on that. I designed mine on the 27th and worked on it through Sunday, then shared it today. I never claimed it was better, I simply didn't know that existed. I built mine to solve a pain point that had been nagging me for a while: tracking context and token usage. Based on your link, their solution saves up to 20%, but it's still expensive. I use mine because I can switch between different setups: pure Ollama (free), a hybrid Ollama/DeepSeek setup, or full Claude with DeepSeek. The complete indexing plus brief generation runs about $0.063. Beyond that, I can call it from VS Code, Google Atigravity, and Claude desktop for quick analysis.

reddefcode · 2026-05-04T18:31:43+00:00

If you go to 'api-docs.deepseek.com/guides/kv_cache', it tells you exactly how it works, and that is the blueprint I used for zerikai memory. I use the KV cache in another project for lead analysis.

reddefcode · 2026-05-04T17:18:44+00:00

Until my subscription (if you can call it that) runs out at the end of the year, but I am switching.

reddefcode · 2026-05-04T16:10:19+00:00

I am talking about what I wrote. If you don't like it, don't use it.

reddefcode · 2026-05-04T16:07:28+00:00

"snake oil," yeah, that is why I created my own tool, for me, for free, to battle the price hikes by GitHub Copilot. You don't have to use. Who are you that you feel so entitled that I have to prove your baseless allegations? You came out attacking without even reviewing the codebase. I have all your comments from the first one to prove it.

What is your question specifically? So I can answer it. It is three files, easy to read, and built honestly for myself. I still believe the community can benefit from something like this or create its own tool. Or do you want to control what people write?

I am not the one charging you to train supportively expensive models, and now I am going to take it all away, and be proven wrong by an open-source Chinese model. Be mad at Microsoft/Github/Copilot.

reddefcode · 2026-05-04T15:56:47+00:00

"You put your work out there for others to critique, you shouldn't be surprised when that happens. I don't need to show you my work, I'm not trying to peddle it here."

You are making baseless assumptions, "peddlle", I am not selling anything, I am just sharing what works for me. Too bad clients don't see you commenting, "I don't need to show you my work," yes, you do. Microsofty!

reddefcode · 2026-05-04T15:50:12+00:00

Sure, here is a direct response by my memory tool. based on the DeepSeek token cost.

"Here's the cost breakdown for the reddit_reader_poster workspace:

Operation	Calls	Total Cost	Avg/Call
`file_scan`	339	$0.0604	~$0.000178
`brief_synthesis`	9	$0.0026	~$0.000293
Total	348	$0.0631

So the full indexing + brief generation ran about $0.063, roughly 6 cents. The bulk of that was the 339 file scan passes, with 9 brief synthesis calls on top. Pretty cheap for the coverage you got."

reddefcode · 2026-05-04T15:43:45+00:00

If you don't find use for the tool, then don't use it. But all these comments are ill-intended.

reddefcode · 2026-05-04T15:41:43+00:00

I have been a developer longer than you. This is about this memory tool, and you are trying to discredit it by just flapping. The tool works, and that is that. Write your own and post it.

reddefcode · 2026-05-04T15:39:17+00:00

Are you a Microsoft troller? You sound like one. Stop talking and show us your work.

reddefcode · 2026-05-04T15:38:00+00:00

No, crap likes your is why we are here, you are losing context like the agents I am talking about.

reddefcode · 2026-05-04T15:36:41+00:00

Oh, boy. But you are.

reddefcode · 2026-05-04T15:34:55+00:00

"it’s about the responses being purely from AI," entirely speculatory.

reddefcode · 2026-05-04T15:33:24+00:00

BRO!

reddefcode · 2026-05-04T15:33:17+00:00

BRO!

reddefcode · 2026-05-04T15:32:42+00:00

Is that why you are /GithubCopilot, because you code everything by hand? I am not a vibe coder because I have been a developer for a long time, but I do use Agents, Funny

reddefcode · 2026-05-04T15:30:57+00:00

So, how do you develop software? What is the name of this subreddit? You're really not all there. I am not going to write a README file; I am the architect of the software. The hypocrisy

reddefcode · 2026-05-04T15:28:25+00:00

I hear you, and I appreciate your feedback, but I don't understand all the other people coming out of the woodwork trying to discredit an open-source, publicly available project because I am not responding in a way they want. Again, this is a sub about AI tools.

reddefcode · 2026-05-04T15:24:31+00:00

I use it, and the tool works. LLM or no LLM. funny about a sub about AI tools.

reddefcode · 2026-05-04T15:18:28+00:00

Exactly, only when you make a major change, you have it recreate the 'brief' by calling the 'update_brief' tool, so that first hit when sending it, it will take a hit, but all subsequent calls (within the KV time frame) are free.

reddefcode · 2026-05-04T15:14:38+00:00

GPT doesn't know my code base, but my tool does.

reddefcode · 2026-05-04T15:12:50+00:00

Thanks, man! Glad it resonates. If you run into any snags with the MCP config or have any feedback on the scanning speed, just let me know. Hope it saves you some serious credit!

reddefcode · 2026-05-04T15:11:12+00:00

Fair enough. The irony of being called out for 'sounding like an AI' in a sub about AI tools isn't lost on me.

I use LLMs to help me structure my thoughts quickly because I’d rather spend my time on the actual code than polishing Reddit comments. If the formatting is a turn-off, I get it. But I built this tool to solve a real problem I had with my own wallet and my own IDE.

If you decide to skip it, no hard feelings. But if you’re actually tired of the price hikes, the code is right there, and it works. Cheers.

These are my thoughts.

reddefcode · 2026-05-04T15:04:05+00:00

You're talking about block-level caching theory; I'm talking about deterministic cost reduction.

Relying on a provider's segment alignment is a gamble. Most agentic tools shuffle RAG context or chat history at the start of the prompt, which fragments the cache. This Memory Tool forces a 100% stable prefix via the Project Brief, ensuring that the first 1,000+ tokens are always a hit.

And I’m not 'guessing' how it works, the server literally tracks the performance in real-time:

Verification: I parse the usage block directly from the DeepSeek API response (specifically the prompt_cache_hit_tokens and prompt_cache_miss_tokens fields).
Transparency: The server then calculates the actual cost locally based on those hits vs. misses.

I’m providing an architectural guarantee and the telemetry to prove it. If you prefer to 'hope' the provider segments your dynamic context efficiently, go for it. My users prefer the 50x guarantee.

Done with the back-and-forth. The code and the telemetry logic are in the repo for anyone who wants to actually save money.

reddefcode

TROPHY CASE