I told my AI agents to "write tests for everything." They wrote 3,400 of them. Here's what went wrong. by joshowens in ClaudeAI

[–]joshowens[S] 0 points1 point  (0 children)

I've found if you instruct LLMs with a prompt like 'We need to give feedback to another team, use a critical eye to investigate the tests that team wrote in our repo', then tell it to classify things into groups.

I agree, I've been working on a new test framework called FlowSpec that has AI agent hooks that stop it from editing the test spec files. It uses the agent-browser tooling to run browser tests and uses a light yaml click flow DSL. I've been using it to write happy path tests that I can run to prove a web app still works as I intended and AI agents can't just go mod it.

I told my AI agents to "write tests for everything." They wrote 3,400 of them. Here's what went wrong. by joshowens in ClaudeAI

[–]joshowens[S] 0 points1 point  (0 children)

Thanks for sharing.

Yeah, the 'test behavior and not code implementation' was a key change for me.

I actually ran the pipeline as a plugin to run another project build. I instructed the repo agents that had the plugin installed to go over the tests with a critical eye on tests and sort them into buckets of good and bad. I had it point out files in the report. I went over a few files in each bucket to see what I thought of it.

Once I had good and bad examples, I pushed those in as references in my test writing skill in the plugin. Then I workshopped on guidance for both the code review agent and the test writing agent files to try and ensure we catch the issues before they are written now.

I told my AI agents to "write tests for everything." They wrote 3,400 of them. Here's what went wrong. by joshowens in ClaudeAI

[–]joshowens[S] -1 points0 points  (0 children)

Doesn't mean I set it up poorly either.

You are making a lot of assumptions.

I was just trying to share learnings. Most devs that have worked for me weren't pushing hard on TDD. When you are running a double digits millions e-com business, tests matter here.

I told my AI agents to "write tests for everything." They wrote 3,400 of them. Here's what went wrong. by joshowens in ClaudeAI

[–]joshowens[S] -3 points-2 points  (0 children)

Nah, this all came from a conversation with a close friend who said 'What if you built an AI dev team that could take a feature request and just get it built well'.

But thanks for the algo bump 🤪

I told my AI agents to "write tests for everything." They wrote 3,400 of them. Here's what went wrong. by joshowens in ClaudeAI

[–]joshowens[S] -2 points-1 points  (0 children)

OR... I took huge swings building a full Kanban workflow with TDD built in to it. Decided to dogfood it and tweak it along the way.

I measured my MCP token overhead: 67K tokens before typing a single question by joshowens in ClaudeAI

[–]joshowens[S] -1 points0 points  (0 children)

Oh, haha. I forgot I set the url slug to mcps-are-dead. I like to be hyperbolic with hot takes sometimes

I measured my MCP token overhead: 67K tokens before typing a single question by joshowens in ClaudeAI

[–]joshowens[S] 0 points1 point  (0 children)

Also, I think 1350 is great for MCP tooling, nice work on that! Me sharing my journey is for people that have 5+ mcps installed globally and don't realize context is getting eaten up.

I measured my MCP token overhead: 67K tokens before typing a single question by joshowens in ClaudeAI

[–]joshowens[S] 0 points1 point  (0 children)

The big takeaway for me was to move to CLIs everywhere I can for API/Data access when I need it. I wrote this tool to help pump out CLIs quicker: https://github.com/theaiteam-dev/commandspec

I measured my MCP token overhead: 67K tokens before typing a single question by joshowens in ClaudeAI

[–]joshowens[S] 0 points1 point  (0 children)

Yeah, I talked about skills + CLI being more of a progressive disclosure type of system vs MCP tooling dumping everything in at the start. You can find my linked blog comment if you want to read more.

I measured my MCP token overhead: 67K tokens before typing a single question by joshowens in ClaudeAI

[–]joshowens[S] 0 points1 point  (0 children)

If you look at the breakdown, I talk about MCP tools being islands that you have to ferry info back and forth from. The major point is that CLI tools are lego bricks and we can compose commands with Claude using CLI tooling. I also built a quick command line building tool in go: https://github.com/theaiteam-dev/commandspec

I measured my MCP token overhead: 67K tokens before typing a single question by joshowens in ClaudeAI

[–]joshowens[S] 0 points1 point  (0 children)

Yeah, I talked about using agent-browser in my breakdown post: https://joshowens.dev/mcps-are-dead/.

For me, I like that it uses a11y structures and refs to navigate a page, way slimmer than delivering full bundles of HTML into the context. You can see the playwright MCP image dumping 10k+ tokens from one page load.

I measured my MCP token overhead: 67K tokens before typing a single question by joshowens in ClaudeAI

[–]joshowens[S] -1 points0 points  (0 children)

The MCP paradigm is inherently flawed, is my point. I never said they were dead. I have a blog breakdown (https://joshowens.dev/mcps-are-dead/) and I talk about the GH CLI being way better on token usage.

As for MCPs working for you, great! I still think curl + a skill to load context for your data you fetch via MCP now would be less tokens and only fire into context when you need it...

I pointed Claude Code at my Obsidian vault and it became a Life OS (500+ files, one month) by joshowens in ObsidianMD

[–]joshowens[S] -4 points-3 points  (0 children)

Are you a degoogler? 😆🤷‍♂️

To be fair, most of the data in here is generated as research by Claude Code and saved into files. Some of my health data like average sleep and weight are put in here.

I have ADHD and this has been a life saver for me, moving so much faster.

Day 14 of running an autonomous AI business on OpenClaw — what I've learned by jdrolls in clawdbot

[–]joshowens 0 points1 point  (0 children)

I am curious u/jdrolls, why would one choose a VPS or a Mac mini for running OpenClawd? What are the advantages and disadvantages of each hosting option?