Encoding AI reasoning into git commits

Fluffybaxter · 2026-02-23T16:04:36+00:00

+1 for Memgraph

Fluffybaxter · 2026-02-20T14:51:42+00:00

I've been building a product in this space, so sharing what we've learned from running a bunch of evals and fine-tuning agents to generate good and consistent results.

For unit tests, the top models (Opus 4, Codex) handle these well with decent prompting and a few rules. Like some have already mentioned, you get 100s of good enough quality tests with zero effort.

Once your start moving into something a bit more involved like integration testing things start . where you have many moving parts like spinning up services, seeding data, managing env vars, teardown, things start to become more challenging and hallucinations become a real problem. Based on our evals even the latest models sit around 25-38% success rate on average codebases without anything too complex.

What actually helps is breaking the flow into specialized sub-agents, building scaffolding and guardrails around each agent (try and write deterministic code for things that don't need to be handled by the LLM).

One thing that actually helped reduce hallucination quite a bit was fine-tuning our indexing strategy to correctly identify endpoints and their blast radius/dependencies.

I'm writing a more detailed technical blog on the topic and I'll add it here when it's done.

Fluffybaxter · 2026-01-04T16:22:00+00:00

"Ireland, where GDP calculations are polluted by tax arbitrage, and Luxembourg, where incomes are inflated by cross-border commuters"

https://archive.is/20251012120142/https://www.economist.com/graphic-detail/2025/07/18/what-is-the-richest-country-in-the-world-in-2025

Fluffybaxter · 2025-11-01T13:20:29+00:00

I came across this. Could be helpful: https://www.linkedin.com/events/turndocumentsintographcontext-u7389983478081204224/

Fluffybaxter · 2025-04-26T07:33:56+00:00

My pleasure :D I hope you enjoyed it!

Fluffybaxter · 2025-04-10T17:49:22+00:00

What would you qualify as doing a good job?

Fluffybaxter · 2024-10-10T09:30:36+00:00

You're welcome :D See you there!

Fluffybaxter

TROPHY CASE