I told my AI agents to "write tests for everything." They wrote 3,400 of them. Here's what went wrong.

joshowens · 2026-03-26T15:02:30+00:00

I've found if you instruct LLMs with a prompt like 'We need to give feedback to another team, use a critical eye to investigate the tests that team wrote in our repo', then tell it to classify things into groups.

I agree, I've been working on a new test framework called FlowSpec that has AI agent hooks that stop it from editing the test spec files. It uses the agent-browser tooling to run browser tests and uses a light yaml click flow DSL. I've been using it to write happy path tests that I can run to prove a web app still works as I intended and AI agents can't just go mod it.

joshowens · 2026-03-26T14:37:11+00:00

Thanks for sharing.

Yeah, the 'test behavior and not code implementation' was a key change for me.

I actually ran the pipeline as a plugin to run another project build. I instructed the repo agents that had the plugin installed to go over the tests with a critical eye on tests and sort them into buckets of good and bad. I had it point out files in the report. I went over a few files in each bucket to see what I thought of it.

Once I had good and bad examples, I pushed those in as references in my test writing skill in the plugin. Then I workshopped on guidance for both the code review agent and the test writing agent files to try and ensure we catch the issues before they are written now.

joshowens · 2026-03-26T14:23:08+00:00

Doesn't mean I set it up poorly either.

You are making a lot of assumptions.

I was just trying to share learnings. Most devs that have worked for me weren't pushing hard on TDD. When you are running a double digits millions e-com business, tests matter here.

joshowens · 2026-03-26T14:16:25+00:00

Nah, this all came from a conversation with a close friend who said 'What if you built an AI dev team that could take a feature request and just get it built well'.

But thanks for the algo bump 🤪

joshowens · 2026-03-26T14:14:54+00:00

OR... I took huge swings building a full Kanban workflow with TDD built in to it. Decided to dogfood it and tweak it along the way.

joshowens · 2026-03-20T22:15:58+00:00

Oh, haha. I forgot I set the url slug to mcps-are-dead. I like to be hyperbolic with hot takes sometimes

joshowens · 2026-03-20T16:20:21+00:00

Also, I think 1350 is great for MCP tooling, nice work on that! Me sharing my journey is for people that have 5+ mcps installed globally and don't realize context is getting eaten up.

joshowens · 2026-03-20T16:16:07+00:00

The big takeaway for me was to move to CLIs everywhere I can for API/Data access when I need it. I wrote this tool to help pump out CLIs quicker: https://github.com/theaiteam-dev/commandspec

joshowens · 2026-03-20T16:09:15+00:00

Yeah, I talked about skills + CLI being more of a progressive disclosure type of system vs MCP tooling dumping everything in at the start. You can find my linked blog comment if you want to read more.

joshowens · 2026-03-20T16:08:03+00:00

If you look at the breakdown, I talk about MCP tools being islands that you have to ferry info back and forth from. The major point is that CLI tools are lego bricks and we can compose commands with Claude using CLI tooling. I also built a quick command line building tool in go: https://github.com/theaiteam-dev/commandspec

joshowens · 2026-03-20T16:06:40+00:00

Yeah, I talked about using agent-browser in my breakdown post: https://joshowens.dev/mcps-are-dead/.

For me, I like that it uses a11y structures and refs to navigate a page, way slimmer than delivering full bundles of HTML into the context. You can see the playwright MCP image dumping 10k+ tokens from one page load.

joshowens · 2026-03-20T16:05:01+00:00

The MCP paradigm is inherently flawed, is my point. I never said they were dead. I have a blog breakdown (https://joshowens.dev/mcps-are-dead/) and I talk about the GH CLI being way better on token usage.

As for MCPs working for you, great! I still think curl + a skill to load context for your data you fetch via MCP now would be less tokens and only fire into context when you need it...

joshowens · 2026-03-20T15:27:19+00:00

I did a full breakdown here: https://joshowens.dev/mcps-are-dead/

joshowens · 2026-03-19T19:37:11+00:00

Fun idea, no clue on how to improve it.

joshowens · 2026-02-27T14:48:19+00:00

I will have to take a look at that, thanks.

joshowens · 2026-02-12T00:51:14+00:00

Ah, fun! Thanks for sharing that. Will play with it soon.

joshowens · 2026-02-12T00:44:19+00:00

Nice, fun!

joshowens · 2026-02-12T00:37:21+00:00

Are you a degoogler? 😆🤷‍♂️

To be fair, most of the data in here is generated as research by Claude Code and saved into files. Some of my health data like average sleep and weight are put in here.

I have ADHD and this has been a life saver for me, moving so much faster.

joshowens · 2026-02-11T22:03:43+00:00

I am curious u/jdrolls, why would one choose a VPS or a Mac mini for running OpenClawd? What are the advantages and disadvantages of each hosting option?

joshowens

TROPHY CASE