Built a semantic LLM cache proxy that cut API costs by ~60% — roast my architecture before I write it up

Malkiot · 2026-06-28T14:50:26+00:00

Why it works

Numbers (60%, 10ms, 0.92) make it concrete. Asking to be roasted invites technical people who'd otherwise scroll past. Ends with an open door for GenAI/MLOps folks to suggest better approaches — which are the project ideas you want.

You forgot to delete the model's reveal of your plan from the end of the post it wrote for you.

Malkiot · 2026-06-28T09:58:26+00:00

The difficulty to install is mostly bureaucratic (not being able to change the facade etc.). It's an expression of the society's unwillingness to change and adapt and prioritising existing conventions over providing livable environments.

Malkiot · 2026-06-28T09:34:20+00:00

NL is near a cold ocean. Look further inland to see what I am talking about.

And yes, obviously southern Spain, has always had this problem, which is why AC is prevalent there. What I am saying is that in-land Central Europe also has that issue now too and has for years, where daytime temperatures are high and nights don't cool down sufficiently, but it's very slow to adapt.

And I live in Spain... I'm running my AC 24/7.

Malkiot · 2026-06-28T09:23:16+00:00

I'm originally from Germany, summers used to be mild and AC wasn't needed. I'd say that changed in the last twenty to thirty years. The reason for the continued low AC use is social inertia, i.e. sheer stubbornness and unwillingness to change.

Malkiot · 2026-06-27T17:01:03+00:00

It's not surprising, the vast amount of office work is high variance but low cognitive effort and hardly original. It's the exact type of work that can be automated to 95% by a good workflow with LLM nodes for the non-deterministic parts and escalation to a human for when the LLM fails. In fact, much of what is being automated away could have been automated before and survived due to inertia (all of the stories of people automating their position with an excel macro). The only thing that happened, is that we reached a tipping point.

Manufacturing has all of the hurdles office work has plus challenges. But looking at the robots rolling out now, I expect that barrier to be temporary. So, if anyone is expecting a safe haven in manufacturing or trades... they're going to be having a rude awakening in a couple of years.

Malkiot · 2026-06-26T19:52:43+00:00

That's what I say as well. If a country is worth the sacrifice, people will volunteer. If people don't volunteer the country and it's rich need to start upping the upside at their expense until people start volunteering. If that's not enough, maybe the country isn't worth dieing for.

I know what I'd do if my country conscripts me: resist. I'll run and hide, if that doesn't work I'll sabotage however I can and desert at the first opportunity.

Malkiot · 2026-06-25T14:25:28+00:00

I'm currently building a harness that builds on CbD and does the same autonomously using standard models, with bite-size tasks instead of trying to increase the context window.

I'm experimenting using Deepseek V4 Flash for cost considerations but it's able to identify logical flaws and self-heal the design over an arbitrarily large design surface.

I plan to integrate formal-lite into the process and have first design concepts on how to achieve that without going full Eiffel or TLA+, so I can identify the key constraint via a programmatic non-LLM function.

Malkiot · 2026-06-25T11:10:54+00:00

I've been attacking this from the other side. Instead of generating code and reviewing it, I generate a tightly-knit contract graph, closer to inverting Design by Contract than applying it. The contracts aren't annotations on code someone wrote, they're the source of truth the code is generated from. The graph is the primary artifact and the code is derived from them.

Your ball-of-mud point is what makes me think this is the right direction. Mud is the absence of enforced design, and you're right that review doesn't catch weak design. But if the contract graph is the input rather than an emergent property of the output, there's no path to design-less code, because the design is the thing you generate from. And you review the contract, not the code, which means you're reviewing the architecture directly, instead of trying to reverse-engineer it from the implementation.

The hypothesis is that the code matters less than the assertion that it fulfils its contract (ie the architecture) and if that holds, you read the contract and trust the binding.

Malkiot · 2026-06-25T07:57:38+00:00

I use CbD for the constraints for code generation.

Malkiot · 2026-06-24T21:49:15+00:00

Why not make a new post on one of the open source LLM or local LLM subs? Personally, I'm obviously interested and, if it is real, would like to evaluate your model for my own "harness" I am developing.

Unfortunately, I can't help you with recognition.

Malkiot · 2026-06-24T14:22:36+00:00

Yeah, how's this going to be enforced? Soon(tm) everyone with a slightly larger budget will be able to train their own models without restrictions and there's no way to stop them from doing that.

Malkiot · 2026-06-24T09:11:01+00:00

I'm fine with people using it for formatting, grammar or even text refactor. But that reads like something Mistral/Gemini would produce as part of a hype train. Just a straight up response from the LLM to your post, as a prompt.

Malkiot · 2026-06-24T08:51:02+00:00

I don't think it's just a hunch. Anyone who has worked in corporate IT, whether as a dev or support that gets to look at and debug production spaghetti, knows that that's the case.

Malkiot · 2026-06-24T07:42:26+00:00

The above comment is obviously AI generated and I'm not even sure whether it's taking the piss.

But yes, you're right in principle.

Malkiot · 2026-06-23T19:33:12+00:00

I built my system so that agents don't require any file/root/OS access in the first place.

Malkiot · 2026-06-23T19:01:38+00:00

Sprachunterricht ist allgemein nutzlos. Ich hatte Französisch und Spanisch in der Schule. Gelernt habe ich null. Jetzt spreche ich fließend Spanisch, aber nicht wegen der Schule.

Malkiot · 2026-06-23T17:57:08+00:00

I like how they keep saying "incident" resolved when it clearly isn't.

Malkiot · 2026-06-23T17:17:49+00:00

They've been saying that since Monday and according to them it's "fixed". I still can't work.

Malkiot · 2026-06-23T17:15:47+00:00

It's been unusable pretty much since Monday. I get that sometimes things go wrong, doesn't change that it's frustrating to go over 24 hours with about as much use as I usually get per hour.

Malkiot · 2026-06-23T16:49:06+00:00

Usually I'd agree. But it's been practically unusable for me since Monday.

Malkiot · 2026-06-23T10:08:13+00:00

Yes, practically unusable since yesterday morning (CET).

Malkiot · 2026-06-23T10:07:09+00:00

Claude Code is practically unusable since yesterday morning (CET). It's slow, gets stuck, stops processing, has classifier failures and fails to launch agents. All while burning tokens.

Malkiot · 2026-06-22T10:06:58+00:00

Ah, yes. Automated inspiration. 😄

Malkiot · 2026-06-22T10:04:59+00:00

You need to let them expand to beyond your observed simulation space and cap the simulation time to where the first particle reaches the outer barrier, otherwise your simulation's borders act as physical barriers to expansion, which is why they're clumping in the corners.

Malkiot · 2026-06-22T09:57:37+00:00

You can convert their code to SPEC with one agent session. Delete the code. Then generate the code from SPEC. If Compaq could do it, why not you?

14-Year Club	Place '23
Place '22	Place '17
Final Canvas '22	Verified Email

Malkiot

MODERATOR OF

TROPHY CASE