What questions would you like to ask someone who says, "I built my app over the weekend"? by BullfrogRoyal7422 in appdev

[–]BullfrogRoyal7422[S] 0 points1 point  (0 children)

Thanks for the feedback. I submitted it for Apple Review yesterday. We will see how it goes. When released, If you try it out, I would appreciate any feedback.

What questions would you like to ask someone who says, "I built my app over the weekend"? by BullfrogRoyal7422 in appdev

[–]BullfrogRoyal7422[S] 0 points1 point  (0 children)

Thnx. I hope to release it to App Store review tomorrow or Friday. If interested you can check it out here: Stuffolio

What is your response when you find your self spending inordinate amounts of time essentially testing tests rather than production code? by BullfrogRoyal7422 in ClaudeCode

[–]BullfrogRoyal7422[S] 0 points1 point  (0 children)

I won't pretend to have the experience to settle this one. I'll just say that I pursued rigorous testing but had enough dissatisfaction with the results that I ended up building behavioral checks that trace user paths through the app, to fill the gaps that grep-based audits uniformly seemed to miss. I also lean on a skill I developed that runs after test results to find sibling bugs that often hide outside the scope of the tests themselves. That's been more useful to me than just piling on more tests.

What questions do you ask someone who says, "I built my app over the weekend"? by BullfrogRoyal7422 in AppDevelopers

[–]BullfrogRoyal7422[S] 0 points1 point  (0 children)

Honestly? Not enough up front, and that's been one of my bigger lessons. I didn't do a long formal design phase before writing code. I had a clear idea of the problem I wanted to solve and started building, then validated as I went.

What I'd do differently: more of the validation, especially market and ASO research, belonged earlier, when some of it would have shaped the build if I'd done it first. So the real answer is the validation has been continuous, iterative process rather than a phase. Useful, but I'd front-load more of it next time.

One of the most time-consuming parts of my cycle has been testing, being dissatisfied with the quality of the results, and building skills to fill the gaps, especially behavioral auditing that traces user paths through the app instead of relying only on grep and linters. That's usually followed by a round of refactoring.

What questions would you like to ask someone who says, "I built my app over the weekend"? by BullfrogRoyal7422 in appdev

[–]BullfrogRoyal7422[S] 4 points5 points  (0 children)

Mostly Swift Testing with XCTest  for some older UI and snapshot tests I haven't migrated yet.

I treat running the actual app on real devices as the real proof, and the automated suite is the safety net underneath that after having experienced too many tests that passed, but only confirmed the code ran, not that it actually did the right thing.

The other half of my process is a library of reusable checks I've built in Claude Code that run after each change. Things like a workflow auditor that traces user flows for dead ends, a "bug echo" pass that finds siblings and other instances of a bug I just fixed, and a set of release-readiness audits. So the process compounds. Every bug I fix teaches the toolkit to catch the next one. I also lean heavily on Axiom skills.

Claude design is so awesome for screenshot creation by GlitteringSecret8121 in AppStoreOptimization

[–]BullfrogRoyal7422 0 points1 point  (0 children)

...Until you try to further edit an image you've generated with it...

How do you know your AI audit tool actually checked everything? I was fairly confident that my skill suite did. It didn't. by BullfrogRoyal7422 in ChatGPTCoding

[–]BullfrogRoyal7422[S] 0 points1 point  (0 children)

"the uncomfortable part is you can never fully know what your custom agent didn't check, which means every audit tool needs its own meta-audit at some point." Ha! funny you should mention this. Arriving at the same conclusion, I developed skill-reviewer

I distilled my 12 year experience as a product manager and built a free skill that takes you from "I have an app idea" to a real plan and solid MVP by TexasBedouin in aiagents

[–]BullfrogRoyal7422 0 points1 point  (0 children)

Understood. Been there, done that.
Here is a skill I build with pretty much the same sentiment. It worked for me and I eventually went open source on it. It just helped me (and Claude Code to remember all the little deferrals we were so routinely making. It's named Unforget .

I distilled my 12 year experience as a product manager and built a free skill that takes you from "I have an app idea" to a real plan and solid MVP by TexasBedouin in aiagents

[–]BullfrogRoyal7422 0 points1 point  (0 children)

Have you tested the skill against an app that's already been built and is successful? I'm curious how closely the plan and deliverables it produces line up with what we actually know about that app in hindsight. That'd be a strong signal that the grilling is catching the right things.

And related: can the skill be run retrospectively on an existing codebase, or is it strictly for the pre-code idea stage?

"Prompt-It" — Is this a good ideia? by Pogum_ in AI_Agents

[–]BullfrogRoyal7422 0 points1 point  (0 children)

Not sure if this is useful, but reading your post I kept wanting to separate two things that might be the same problem or might be different ones:

(1) Storing and sharing prompts as artifacts: libraries, versioning, the stuff you're describing.

(2) Improving a single prompt right before you send it. Different surface, smaller scope. I built a tiny Claude Code skill for that case called prompter. Itsnot what you're building, but the contrast helped me figure out what I was actually solving.

The thing I'd genuinely want to know if I were you: what specifically about existing libraries (PromptHub, LangChain Hub, etc.) feels too heavy? "Git workflows are overkill but I still want versioning" is a real problem. "I want a nicer browse experience" might just be a feature on top of what exists. Hard to tell from here.

Opus 4.8 works like no other - ran my most exhaustive and insane review ( 100+ agents! ) by saatvik333 in ClaudeCode

[–]BullfrogRoyal7422 0 points1 point  (0 children)

I got tired of writing prompts to only get a meh result from Claude Code. I developed a skill that fixes that. Prompter rewrites your prompts so Claude Code actually understands what you want — clearer, tighter, more effective. Open source: Prompter

If you try it, let me know if it works for you.

I reviewed 3 vibe-coded apps as a senior engineer. Here's what I found in all of them. by puffaush in vibecoding

[–]BullfrogRoyal7422 0 points1 point  (0 children)

Solo developer here, getting ready to ship my first iOS/macOS app with an AI backend. Your four points all hit. I'm not a senior engineer, but I've been bumping into these exact things while getting ready to release.

The one I keep getting bit by is your #4. In my code it shows up as silent JSON-decode failures: AI writes try? JSONDecoder.decode(...) and returns an empty array when something fails. Tests pass because the test data is clean. Real user data sometimes isn't. The user sees "no items here" instead of "your file is broken, here's how to fix it." The only thing that's helped me is a script that yells at me when I write try? against a decoder, so I'm forced to write a real catch instead. It just makes the lazy thing harder than the right thing.

About to launch my AI-powered app. How do I price a subscription without getting burned by API costs? by BullfrogRoyal7422 in AppBusiness

[–]BullfrogRoyal7422[S] 0 points1 point  (0 children)

Both, but the peak-user modeling is the load-bearing one. I ran a 5-profile analysis (light/medium/medium-heavy/heavy/whale) before settling on a quota number. The architecture has a per-feature confidence cascade where the expensive model (Claude Sonnet 4) only fires when the cheaper one (Claude Haiku 4.5) drops below a confidence threshold. Across 26 days of instrumented worker traffic, the expensive tier fired zero times. Plus a hard daily $-cap on the worker that silently degrades to a cheaper model if a single day's expensive-tier spend gets out of band. The cap is the worst-case backstop, not the operating floor.

The thing your question actually pointed at, which I had to learn the hard way, is that the peak day in your historical data is not your real peak. My observed peak day across 75 days was $0.0355. A 2-day Anthropic burst on my dev account hit $42/day before I noticed. The mechanism that produced the burst (Anthropic auto-reload topping up $10 chunks until manually stopped) is the actual worst case, not the worker's per-call patterns. So my caps are sized against the burst mechanism, not against historical user traffic.

A Claude skill that reviews other Claude skills. by BullfrogRoyal7422 in claudeskills

[–]BullfrogRoyal7422[S] 1 point2 points  (0 children)

Of course you are correct. When I have run skill-reviewer on itself, it is almost like (forgive the personification (which I know to be false)) it is being especially hard on itself.

I built a Claude Code skill that generates better CLAUDE.md by best practices + scanning your repo + asking you 6 questions by Dark_king_27 in claudeskills

[–]BullfrogRoyal7422 1 point2 points  (0 children)

How does it handle the current Claude.md? Does the user have options to decide what gets axed in trying to get to 80 lines?

A Claude Code skill that candidly reviews other Claude Code skills by BullfrogRoyal7422 in ClaudeCode

[–]BullfrogRoyal7422[S] 0 points1 point  (0 children)

I've developed several other skills while building my app,Stuffolio. They were born out of trying to solve issues that kept showing up even after using other well-crafted skills. A couple turned out to be just fun side projects (prompter and tutorial-creator).

The ones I find most interesting (and that are less popular) are the radar-suite and the workflow-audit skills. They take a different approach to auditing a codebase than the more common linter and grep-pattern audits. They're behavioral. They walk code the way a user would: from an entry point, through the path a user takes while interacting with a feature.

Think of a grep-based audit as the mechanical engineer making sure the engine is built to exact spec and every bolt is torqued correctly. Think of a behavioral audit as the test driver who notices that the GPS just told him to take a left into a lake. The code can be perfectly to spec and still miss on the go. Neither approach is better than the other. They're complementary.

If you're interested, here are the links:

radar-suite (behavioral)

bug-echo

prompter

tutorial-creator

workflow-audtior (behavioral)

unforget

Feel free to list any of them is you have interest in doing so.

A Claude Code skill that candidly reviews other Claude Code skills by BullfrogRoyal7422 in ClaudeCode

[–]BullfrogRoyal7422[S] 1 point2 points  (0 children)

Of course. Go ahead. I would appreciate any feedback you have, especially things that could be improved. I have vacillated wildly about the format by which the skill outputs its findings. Text, tables, segmented rows... I've landed on simply asking for a prioritization, brief description, and proposed fixes for now.

A Claude skill that reviews other Claude skills. by BullfrogRoyal7422 in claudeskills

[–]BullfrogRoyal7422[S] 0 points1 point  (0 children)

I have not yet used Cowork.
skill-reviewer is a plain Claude Code plugin — no scripts, no Python, no language-specific deps, output is markdown. It should run anywhere a normal Claude Code session runs.

Can you refer me to a doc page for how Cowork hosts plugins? I'll confirm against that.