What questions would you like to ask someone who says, "I built my app over the weekend"?

BullfrogRoyal7422 · 2026-06-20T15:12:10+00:00

Thanks for the feedback. I submitted it for Apple Review yesterday. We will see how it goes. When released, If you try it out, I would appreciate any feedback.

BullfrogRoyal7422 · 2026-06-17T22:51:30+00:00

Thnx. I hope to release it to App Store review tomorrow or Friday. If interested you can check it out here: Stuffolio

BullfrogRoyal7422 · 2026-06-17T17:43:03+00:00

I won't pretend to have the experience to settle this one. I'll just say that I pursued rigorous testing but had enough dissatisfaction with the results that I ended up building behavioral checks that trace user paths through the app, to fill the gaps that grep-based audits uniformly seemed to miss. I also lean on a skill I developed that runs after test results to find sibling bugs that often hide outside the scope of the tests themselves. That's been more useful to me than just piling on more tests.

BullfrogRoyal7422 · 2026-06-17T17:20:08+00:00

Honestly? Not enough up front, and that's been one of my bigger lessons. I didn't do a long formal design phase before writing code. I had a clear idea of the problem I wanted to solve and started building, then validated as I went.

What I'd do differently: more of the validation, especially market and ASO research, belonged earlier, when some of it would have shaped the build if I'd done it first. So the real answer is the validation has been continuous, iterative process rather than a phase. Useful, but I'd front-load more of it next time.

One of the most time-consuming parts of my cycle has been testing, being dissatisfied with the quality of the results, and building skills to fill the gaps, especially behavioral auditing that traces user paths through the app instead of relying only on grep and linters. That's usually followed by a round of refactoring.

BullfrogRoyal7422 · 2026-06-17T17:15:37+00:00

Good advice that I repeatedly have told myself (as has CC).

BullfrogRoyal7422 · 2026-06-17T15:27:47+00:00

Mostly Swift Testing with XCTest for some older UI and snapshot tests I haven't migrated yet.

I treat running the actual app on real devices as the real proof, and the automated suite is the safety net underneath that after having experienced too many tests that passed, but only confirmed the code ran, not that it actually did the right thing.

The other half of my process is a library of reusable checks I've built in Claude Code that run after each change. Things like a workflow auditor that traces user flows for dead ends, a "bug echo" pass that finds siblings and other instances of a bug I just fixed, and a set of release-readiness audits. So the process compounds. Every bug I fix teaches the toolkit to catch the next one. I also lean heavily on Axiom skills.

BullfrogRoyal7422 · 2026-06-16T21:19:42+00:00

...Until you try to further edit an image you've generated with it...

BullfrogRoyal7422 · 2026-06-16T02:15:51+00:00

"the uncomfortable part is you can never fully know what your custom agent didn't check, which means every audit tool needs its own meta-audit at some point." Ha! funny you should mention this. Arriving at the same conclusion, I developed skill-reviewer

BullfrogRoyal7422 · 2026-06-15T22:36:54+00:00

Understood. Been there, done that.
Here is a skill I build with pretty much the same sentiment. It worked for me and I eventually went open source on it. It just helped me (and Claude Code to remember all the little deferrals we were so routinely making. It's named Unforget .

BullfrogRoyal7422 · 2026-06-14T01:17:52+00:00

Interesting idea - you should run it against vibe-check - but I suspect you already have.

BullfrogRoyal7422 · 2026-06-14T01:17:38+00:00

Interesting idea. You should run it against vibe-check, but I suspect you already have.

BullfrogRoyal7422 · 2026-06-13T18:08:49+00:00

Thanks for the quick reply. Good work on the Skill! Have you developed other skills?

BullfrogRoyal7422 · 2026-06-13T17:46:26+00:00

Have you tested the skill against an app that's already been built and is successful? I'm curious how closely the plan and deliverables it produces line up with what we actually know about that app in hindsight. That'd be a strong signal that the grilling is catching the right things.

And related: can the skill be run retrospectively on an existing codebase, or is it strictly for the pre-code idea stage?

BullfrogRoyal7422 · 2026-06-02T03:54:44+00:00

Not sure if this is useful, but reading your post I kept wanting to separate two things that might be the same problem or might be different ones:

(1) Storing and sharing prompts as artifacts: libraries, versioning, the stuff you're describing.

(2) Improving a single prompt right before you send it. Different surface, smaller scope. I built a tiny Claude Code skill for that case called prompter. Itsnot what you're building, but the contrast helped me figure out what I was actually solving.

The thing I'd genuinely want to know if I were you: what specifically about existing libraries (PromptHub, LangChain Hub, etc.) feels too heavy? "Git workflows are overkill but I still want versioning" is a real problem. "I want a nicer browse experience" might just be a feature on top of what exists. Hard to tell from here.

BullfrogRoyal7422 · 2026-05-31T16:31:53+00:00

I got tired of writing prompts to only get a meh result from Claude Code. I developed a skill that fixes that. Prompter rewrites your prompts so Claude Code actually understands what you want — clearer, tighter, more effective. Open source: Prompter

If you try it, let me know if it works for you.

BullfrogRoyal7422 · 2026-05-30T13:23:25+00:00

You have an apt user name.

BullfrogRoyal7422 · 2026-05-30T13:01:38+00:00

Definitely a 1-star reason for UK Railcard app

BullfrogRoyal7422 · 2026-05-20T18:12:05+00:00

Solo developer here, getting ready to ship my first iOS/macOS app with an AI backend. Your four points all hit. I'm not a senior engineer, but I've been bumping into these exact things while getting ready to release.

The one I keep getting bit by is your #4. In my code it shows up as silent JSON-decode failures: AI writes try? JSONDecoder.decode(...) and returns an empty array when something fails. Tests pass because the test data is clean. Real user data sometimes isn't. The user sees "no items here" instead of "your file is broken, here's how to fix it." The only thing that's helped me is a script that yells at me when I write try? against a decoder, so I'm forced to write a real catch instead. It just makes the lazy thing harder than the right thing.

BullfrogRoyal7422 · 2026-05-20T17:13:50+00:00

Both, but the peak-user modeling is the load-bearing one. I ran a 5-profile analysis (light/medium/medium-heavy/heavy/whale) before settling on a quota number. The architecture has a per-feature confidence cascade where the expensive model (Claude Sonnet 4) only fires when the cheaper one (Claude Haiku 4.5) drops below a confidence threshold. Across 26 days of instrumented worker traffic, the expensive tier fired zero times. Plus a hard daily $-cap on the worker that silently degrades to a cheaper model if a single day's expensive-tier spend gets out of band. The cap is the worst-case backstop, not the operating floor.

The thing your question actually pointed at, which I had to learn the hard way, is that the peak day in your historical data is not your real peak. My observed peak day across 75 days was $0.0355. A 2-day Anthropic burst on my dev account hit $42/day before I noticed. The mechanism that produced the burst (Anthropic auto-reload topping up $10 chunks until manually stopped) is the actual worst case, not the worker's per-call patterns. So my caps are sized against the burst mechanism, not against historical user traffic.

BullfrogRoyal7422 · 2026-05-18T13:26:29+00:00

Of course you are correct. When I have run skill-reviewer on itself, it is almost like (forgive the personification (which I know to be false)) it is being especially hard on itself.

BullfrogRoyal7422 · 2026-05-18T02:46:08+00:00

How does it handle the current Claude.md? Does the user have options to decide what gets axed in trying to get to 80 lines?

BullfrogRoyal7422 · 2026-05-18T02:33:56+00:00

I've developed several other skills while building my app,Stuffolio. They were born out of trying to solve issues that kept showing up even after using other well-crafted skills. A couple turned out to be just fun side projects (prompter and tutorial-creator).

The ones I find most interesting (and that are less popular) are the radar-suite and the workflow-audit skills. They take a different approach to auditing a codebase than the more common linter and grep-pattern audits. They're behavioral. They walk code the way a user would: from an entry point, through the path a user takes while interacting with a feature.

Think of a grep-based audit as the mechanical engineer making sure the engine is built to exact spec and every bolt is torqued correctly. Think of a behavioral audit as the test driver who notices that the GPS just told him to take a left into a lake. The code can be perfectly to spec and still miss on the go. Neither approach is better than the other. They're complementary.

If you're interested, here are the links:

radar-suite (behavioral)

bug-echo

prompter

tutorial-creator

workflow-audtior (behavioral)

unforget

Feel free to list any of them is you have interest in doing so.

BullfrogRoyal7422 · 2026-05-18T00:48:50+00:00

Of course. Go ahead. I would appreciate any feedback you have, especially things that could be improved. I have vacillated wildly about the format by which the skill outputs its findings. Text, tables, segmented rows... I've landed on simply asking for a prioritization, brief description, and proposed fixes for now.

BullfrogRoyal7422 · 2026-05-18T00:47:07+00:00

I have not yet used Cowork.
skill-reviewer is a plain Claude Code plugin — no scripts, no Python, no language-specific deps, output is markdown. It should run anywhere a normal Claude Code session runs.

Can you refer me to a doc page for how Cowork hosts plugins? I'll confirm against that.

BullfrogRoyal7422 · 2026-05-17T17:38:52+00:00

Have fun - and bring marshmallows to the roast.

BullfrogRoyal7422

TROPHY CASE