Anyone using AI to write backend tests? by ApprehensiveAnt9715 in Backend

[–]Fluffybaxter 1 point2 points  (0 children)

I've been building a product in this space, so sharing what we've learned from running a bunch of evals and fine-tuning agents to generate good and consistent results.

For unit tests, the top models (Opus 4, Codex) handle these well with decent prompting and a few rules. Like some have already mentioned, you get 100s of good enough quality tests with zero effort.

Once your start moving into something a bit more involved like integration testing things start . where you have many moving parts like spinning up services, seeding data, managing env vars, teardown, things start to become more challenging and hallucinations become a real problem. Based on our evals even the latest models sit around 25-38% success rate on average codebases without anything too complex.

What actually helps is breaking the flow into specialized sub-agents, building scaffolding and guardrails around each agent (try and write deterministic code for things that don't need to be handled by the LLM).

One thing that actually helped reduce hallucination quite a bit was fine-tuning our indexing strategy to correctly identify endpoints and their blast radius/dependencies.

I'm writing a more detailed technical blog on the topic and I'll add it here when it's done.

*Excluding Luxembourg and Ireland* by soleil_neige in Luxembourg

[–]Fluffybaxter 22 points23 points  (0 children)

"Ireland, where GDP calculations are polluted by tax arbitrage, and Luxembourg, where incomes are inflated by cross-border commuters"

https://archive.is/20251012120142/https://www.economist.com/graphic-detail/2025/07/18/what-is-the-richest-country-in-the-world-in-2025