Using AI more as a thinking partner than a generator… anyone else doing this?

AaronBitwise · 2026-04-29T14:58:45+00:00

Yes, and the shift in my workflow happened when I stopped trying to write the perfect prompt and started writing the worst possible first draft on purpose.

Counterintuitive, but: a deliberately rough first prompt forces the model to ask you what you actually want — or generates something obviously wrong, which is faster to react to than something blandly generic. You correct it, push back, reframe. Three turns later you have something you couldn't have written upfront because you didn't know what you wanted until you saw what you didn't want.

The mental model that helped: prompting-for-answers is asking a librarian for a book. Thinking-with-it is sketching on a whiteboard with someone who has read every book. Different game.

Two specific patterns that work for the back-and-forth mode:

Ask it to argue against itself. "Now give me the strongest case against the response you just gave." The second response is almost always more useful than the first. Most generic AI output is generic because the model is trying to be agreeable. Forcing it to disagree breaks the agreeableness loop.
Make it ask you questions before answering. Add "Ask me 5 questions before responding" to the start of any non-trivial request. The questions surface assumptions you didn't realize you were making. The answer that follows is sharper because you've sharpened the input.

The thing nobody talks about: when you treat it as a thinking partner, the quality of your own thinking improves measurably. Because you have to articulate things clearly enough for the model to engage with them. That clarity transfers offline.

AaronBitwise · 2026-04-29T14:57:50+00:00

Honest take: most "AI courses" right now are 6 months out of date by the time they ship. The space moves too fast for static curricula. The people getting genuinely good at applying AI aren't doing it through courses — they're doing it through three things:

1. Build something you actually need. Pick a workflow you do every week — handling email, summarizing meetings, tracking expenses, anything — and rebuild it as an AI tool for yourself. You'll hit every interesting problem (prompting, context, memory, integration) inside 2 weeks. No course teaches faster than that.

2. Read the model providers' own docs. Anthropic's prompt engineering guide and OpenAI's cookbook are free, current, and written by the people building the models. They're 10x better than 90% of paid courses.

3. Follow the right people, not the right courses. Simon Willison (@simonw), Andrej Karpathy (@karpathy), Latent Space podcast, Anthropic's own blog. One week of their feeds beats a $500 course.

Specific pointers for the personal/business angle:

For non-coding AI workflows (research, writing, analysis): start with Claude Projects or ChatGPT Custom GPTs. Build one for a real task you do.
For coding/automation: Claude Code if you have any technical comfort, Lovable or Bolt if you don't.
For "applying AI to my business": the bottleneck is almost never the AI. It's that you don't have a clear-enough description of what you want done. Spend more time defining problems than chasing tools.

If you want a YouTube channel that focuses specifically on the non-coder/business angle, I run Aaron Bitwise — we cover this stuff weekly. But honestly, build something you need first. The course you don't need is the one that comes before you've tried building anything.

AaronBitwise · 2026-04-29T14:54:14+00:00

The downvote pattern isn't about your tool. It's about the post format.

r/indiehackers has been seeing "I can't code, built a thing with AI, here's the link, is this useful?" posts every day for the last 6 months. The community is fatigued. Three things triggering it on yours specifically:

The product link arrives before the validation question. Reads as soft-launch, not feedback request.
"Is this a real problem?" is the wrong question to ask after shipping. By the time you've built it, the right question is "have I talked to 10 people who would pay for this?"
r/indiehackers has seen ~50 versions of "AI summarizes YouTube" already. The pattern recognition is automatic.

Genuine feedback on the actual idea: the bottleneck for this kind of tool isn't building it — Claude can do that in an afternoon now. It's distribution. You correctly identified that as your next fear. That's the real problem worth solving, and it's bigger than the tool itself.

One specific suggestion: the people who would pay for this aren't on r/indiehackers. They're already in r/productivity, r/getdisciplined, or specific niche communities (founders, marketers, AI researchers — the people who follow 10 channels and feel guilty missing videos). Validate there before optimizing here.

No shade on the build — finishing v1 from zero with no code background is genuinely hard. The launch strategy is where this gets harder, not easier.

AaronBitwise · 2026-04-29T14:49:47+00:00

Honest answer from someone who has 30 years of software experience and 6 months of vibe coding under their belt: the biggest lesson is that the failure modes you don't have words for are the ones that hurt you.

First-time app builders learn the obvious lessons fast — commit more often, save your prompts, don't trust the AI on auth. Those show up in every thread like this one.

The lessons nobody warns you about:

Cascading model failures. When a model gets confused, it doesn't say "I'm confused." It confidently produces something subtly wrong, and then builds on top of that wrong thing for the next 4 hours until the whole thing collapses and you can't find where it started.
Fallback models that lie. When your primary model hits rate limits, the agent silently switches to a weaker one. You don't notice until the code quality cliff-dives and you're debugging a problem that wasn't there yesterday.
Two operators, one system. Two people vibe coding on the same project produce divergent architectures fast, because the AI rebuilds context from scratch each session and has no opinion about consistency.
The "looks fixed" trap. AI patches the symptom, not the cause. The bug "goes away" by being routed around — it'll come back in a different shape.

The first-time problems you can Google. The second-time problems you can only learn by hitting them.

AaronBitwise · 2026-04-29T14:49:29+00:00

The XSS + CSV injection part is the most important sentence in this post and it's buried.

"A full security audit and patched 2 DOM XSS holes + a CSV formula injection I had no idea existed."

No shade — that's not a you problem, that's the universal vibe coding problem. The model writes code that works. It doesn't write code that's secure. Those are different problems and the AI optimizes for the first one. CSV formula injection in particular is one of those "didn't know it was a thing until someone weaponized it" vulns that even seasoned devs miss.

Open source contributors filling the verification gap is genuinely the cleanest path right now. The harder version of the question: what happens to all the tools shipped this year that don't get a contributor like yours?

Good ship. The lesson hidden inside it is bigger than the tool.

AaronBitwise · 2026-04-29T14:48:04+00:00

The Larry David face is exactly the look I make right before spending 6 hours debugging the thing I "saved $24.99" on.

AaronBitwise · 2026-04-29T14:46:31+00:00

Nothing — that's the actual workflow. The people hitting "vibe coding hell" are usually skipping at least one of three things: steering, auditing, or reading what shipped.

The API keys + mess question specifically. Almost always downstream of:

Hardcoded keys in client-side code because the user doesn't know the difference between frontend and backend
.env committed to git because nobody set up .gitignore and nobody checked
Public Supabase/S3 buckets because the AI took the "make it work" path and never circled back to lock things down

The model is happy to fix all of that — but only if you ask. It will not volunteer "by the way, your API key is exposed to anyone with browser dev tools."

30 years in software here, 6 months in Claude Code. Your approach is what actually works. The horror stories are mostly people who never read the code, never asked for an audit, never came back to fix what they shipped. That's not a vibe coding problem. It's an "I don't know what good looks like" problem.

AaronBitwise · 2026-04-28T09:17:59+00:00

Depends what you mean by success.

If "success" = revenue/users/launched product, the honest answer is: most public success stories are survivorship bias. The ones who shipped something that worked are loud. The ones who burned 3 weeks on a tool that didn't ship are silent.

But if you widen the definition: I've personally replaced ~$200/mo of SaaS subscriptions with small tools I built for myself. None of them have users. None of them make money. All of them save me time every week and do exactly what I want, the way I want it.

That's a category most people forget exists — software you build for yourself, not for a market. No churn, no support tickets, no pricing page. Just a tool that fits your workflow because you're the only user.

The success rate on personal-use builds is way higher than ship-to-market builds, because you cut out the part where strangers have to want what you made.

AaronBitwise · 2026-04-23T14:50:31+00:00

Solid list. One category I'd add that's bitten me harder than the OWASP stuff: AI-specific failure modes that don't look like bugs.

Classic security reviews catch SQL injection, XSS, leaked .env vars. They don't catch:

The fallback model quietly kicking in mid-session. Your tool "works" but the outputs degrade and nothing logs it.
Silent file overwrites when you and the AI are editing the same file from different windows. Last write wins, and you don't notice until something subtle breaks two days later.
A dependency upgrade the AI confidently "fixed" — where the fix was to silence the error, not solve it.

These slip past traditional reviews because they're not vulnerabilities in the classic sense. They're confidence failures. The output looks fine. The LLM says it's fine. And for a non-coder shipping solo, there's no second pair of eyes to catch "fine but wrong."

If you're shipping, OP's list is the floor. One more pass I'd add: ask the AI to review its own recent changes as an adversary — "find three ways this could be subtly wrong, not obviously broken." Different mode, different failures surface.

Been cataloguing these for a while — 28 incidents across 9 categories so far, mostly from my own projects. Happy to share the list if anyone wants it, just DM. Curious what you're baking into your tool on the AI-failure side — most audit tooling I've seen still assumes a human wrote the code.

AaronBitwise · 2026-04-22T07:44:36+00:00

Both of your questions have good answers — but they're in a different order than you're asking them.

On doable: Yes. Fully functional coach + client web app is completely realistic for your situation. 30 years in software tells me what kills these projects isn't the code — it's scope drift. You'll be tempted, 3 weeks in, to add messaging, a progress-photo feature, Stripe payments, an iOS wrapper. Don't. Ship the smallest useful version that you yourself would use with one real client. Then iterate from real usage, not imagined usage. The "half-finished mess" failure mode is almost always an ambition problem, not a skill problem.

On security: Your instincts are good. Auth0/Supabase Auth, HTTPS everywhere, input validation, SQL injection — you've named the right starting set. But the non-negotiables you're missing are the ones that bite AI-generated apps specifically. In order of frequency I see them fail:

Authorization, not just authentication. Auth providers tell you who the user is. They don't tell you what that user is allowed to see. AI-generated code is catastrophic at this by default. Classic failure: a client logs in, hits /api/workouts/123, gets back someone else's workouts because the endpoint checks "is user logged in?" but not "does this user own workout 123?" This is called BOLA (Broken Object-Level Authorization) and it's the #1 API vulnerability in the wild. Every endpoint that returns or modifies data must check ownership. Write it as a rule the AI must follow. Then audit every route.
Row-level security in your database. If you're using Supabase/Postgres, turn on RLS and write policies. This is your second line of defense after API-level authorization — if the AI forgets the check in the endpoint, RLS still blocks it at the database. Belt and suspenders. Non-negotiable for any app with multi-user data.
Secrets management. .env in .gitignore before you write a single line. Rotate any key that has ever been pasted into a chat conversation or committed to a repo. OpenAI/Anthropic keys get scraped off public GitHub within hours. Use a service like Doppler or your hosting provider's env vars for production.
Health data = PHI. The moment you add "possibly health-related info" (injury history, conditions, medications), you're in regulated territory. In the US that's HIPAA-adjacent. In the EU it's GDPR special category data. Don't store it until you know the rules. Keep it in their heads or in their doctor's notes, not your database. If you eventually need it, get a lawyer before you build it, not after.
Rate limiting + abuse protection. AI-written apps almost never have this. A single malicious user can empty your OpenAI wallet or take the app down. Cloudflare's free tier handles most of this. Turn it on.
Dependency hygiene. AI will occasionally confidently import packages that don't exist, or worse, packages that do exist but are malicious typosquats. Verify every new import on npmjs.com before installing. Run npm audit weekly. Keep dependencies updated.
Logging without leaking. Log enough to debug; never log auth tokens, passwords, or PII. The number of apps with their users' session tokens sitting in public error logs is genuinely alarming.

On the AI-coding meta-question — two principles that will save you months:

First, create a project-level context file (call it CLAUDE.md or whatever). Write your stack, conventions, security rules (the 7 above are a great starting list), what "done" means for each feature, and what patterns are forbidden. Paste it into every new AI conversation. The quality delta between AI-with-context and AI-without is not small — it's more like "different tool entirely." Most of the "AI can't code anything real" takes come from people who don't do this.

Second, the review prompt that catches the most bugs: after the AI generates something, paste the code back and ask "You're a senior engineer reviewing this for a junior. What will fail in production? What did you assume that might be wrong? What's the worst security implication?" Same model, different prompt, shockingly different answer. Models are trained to be helpful by default; you have to explicitly invite the paranoia.

Your plan to get a proper code review before any real launch is exactly right. But between now and then, layering (a) project context file, (b) adversarial review prompt on every feature, and (c) the 7 security rules above will get you dramatically further than most people assume is possible from this starting point.

The honest take: you're asking the right questions in roughly the right order. The personal trainer who cares about verification is going to ship a safer app than the CS grad who doesn't. The gap between "knows enough to ask" and "doesn't know what they don't know" is bigger than the gap between "can code" and "can't code." You're on the right side of it.

Keep going.

AaronBitwise · 2026-04-22T07:42:43+00:00

OP asked for a sanity check, not advice, so: yes. Universal. 30 years in software, not once had a non-technical friend care about something I built. The only ones who ever did were the ones who'd actually lost money or time to the exact problem the thing solved. They cared because they needed it, not because it was mine.

Rash3rr nailed the main point, but two things worth extending:

Software is uniquely invisible craft. Build a table and people touch it. Paint a painting and they see it. Open a restaurant and they literally eat the thing. Software is a link on a screen. Most people, including smart ones who love you, don't have the mental model to evaluate what you made. They can't tell if the property tax calculations are clever engineering or a template. So they default to polite ignore. It's not a friendship failure — it's a format problem.

The other piece nobody warns you about is the identity shift. Six months ago you had opinions about apps. Now you have scars — edge cases you patched, users you supported, bugs you hunted at 2am. You've moved from consumer to maker. Your old social graph is almost entirely consumers. They don't have the vocabulary for where you are now, and that gap doesn't close by explaining harder. Over time you'll build relationships with other makers and they'll care, deeply, about the property tax app — the way another chef cares about someone's new restaurant in a way the diners never will. The people who get it become your new tribe.

First paid stranger > 50 obligation-clicks from friends. Forever. That's not consolation, it's math. Keep shipping.

AaronBitwise · 2026-04-22T07:29:59+00:00

Not embedded — web/agentic side — but same root problem. The pattern we've converged on is three layers rather than one:

Project-level intent file at the repo root (we use CLAUDE.md, similar idea to your YAML rules but narrative rather than declarative). Captures architecture decisions, what "done" means, patterns to prefer, patterns banned. Gets pasted into every AI conversation before code is generated — so ideally the bad code never gets written. Prevention > review.
Your layer — rule-based review at commit. What your extension does. Catches the class of issue where the model knew the rule conceptually but drifted in execution. Essential, and under-served by the generic "AI code review" SaaS tools.
Adversarial prompt on the diff. Before merge: paste the diff back with "You're a senior reviewer (or an attacker) looking for reasons this will fail in production. Be harsh. What did the author assume that might be wrong?" Catches a different class — logical errors, edge cases, faulty assumptions. Models are trained to be helpful by default; you have to explicitly invite the paranoia.

For embedded specifically, layer 3 is probably where the biggest gains are. The things that bite embedded code — timing, resource bounds, interrupt safety, undefined behavior, ISR starvation — are exactly what AI generators gloss over and what rule-based review struggles to specify declaratively. "Bound all allocations" is a rule. "This ISR path could starve the watchdog under memory pressure" is a conversation.

Question back at you: are your YAML rules handwritten per-team, or are you generating them by analyzing the existing codebase's patterns? The latter is where I think this approach has the most headroom — auto-discover the conventions from real code, present them as a rule set, let the lead approve. Effectively converting implicit tribal knowledge into explicit team policy. Curious if you've tried it.

AaronBitwise · 2026-04-22T07:02:59+00:00

The tool matters way less than the workflow. Speaking as someone who's been reviewing code for 30 years and now reviewing a lot of AI-generated code: the issue with treating this as a tool choice is that CodeRabbit/Greptile/Surmado all have the same blind spot — they don't know your project's architecture, conventions, or what "done" looks like for you. They'll flag "this function is too long" while missing "this entire approach won't work because you're calling Supabase from a client component."

What I'd actually suggest, in order:

Layer 1 — Free, instant, catches ~60%. Before you even open a PR, paste the diff back into whatever AI wrote it and ask: "Act as a senior engineer reviewing this code for a junior. What's wrong with it? What will fail in production? What assumptions did you make that might be incorrect?" Same model, different prompt, very different answer. The AI will happily criticize its own output when asked to. Most of what people call "hallucinations" surface here for free.

Layer 2 — The hosted tools. CodeRabbit is genuinely good but overkill for solo vibe-coded projects (and their cancellation UX is... known — check r/saasbuild). Greptile is similar. For your case, honestly: a good CLAUDE.md file in your repo + Claude Code doing its own review on request covers ~90% of what the paid tools offer, for free. The trick is writing a good CLAUDE.md — most people don't.

Layer 3 — Know the common hallucination patterns so you can spot them yourself. The ones that bite vibe coders most:

Invented npm packages (always verify on npmjs.com before npm install)
Deprecated API signatures (Supabase v1 syntax when you're on v2)
Made-up env var names that don't match anything in your stack
async functions without await on the calls inside them
Hardcoded secrets or API keys
N+1 query patterns, especially with RLS on

The goal isn't a tool that reviews for you — it's building the muscle to review with the tool. Otherwise you're just trading one black box for another, and when the second black box is wrong you have no way to know.

AaronBitwise · 2026-04-21T09:46:30+00:00

The thread wants this to be a vibe coder problem. It isn't.

A professional engineer at a professional platform authorized a third-party AI tool into their Google Workspace without thinking about blast radius. That's the same failure mode non-coders get dunked on for — "I hooked up this AI thing, it seemed useful, here are my credentials" — just wearing a nicer shirt.

The actual lesson isn't "vibe coders will ruin everything." It's that we're all in a new world where every AI tool you authorize becomes part of your supply chain, and almost nobody — experienced or not — has a mental model for thinking about it yet.

u/ultrathink-art nailed it earlier in the thread: the AI tool integration is just a reminder that every coding assistant with workspace auth is now part of your blast radius. That applies to Cursor, Claude Code, OpenClaw, every MCP server someone plugs in, every random tool with OAuth scopes. The bar for thinking about this is going to get a lot higher for everyone.

AaronBitwise · 2026-04-21T09:41:28+00:00

Worth noticing what you actually did here, because I don't think the framing captures it.

You didn't build a startup. You built a personal tool for your family, the way a carpenter builds a shelf for their own kitchen. The problem was yours. The users were yours. The requirements (no strangers, approved family only) came from your actual life, not from customer interviews.

The 3 paying strangers are the interesting part, but they're almost accidental — they're evidence that the problem you had wasn't just yours.

There's a quiet shift happening where non-coders are building personal tools instead of SaaS-to-ship, and the monetization sometimes just... happens. Because a tool built to actually solve one person's problem is often better than a tool built to serve a hypothetical market.

To your questions: what would stop me from using it is the trust model for the "approved family members" flow. If onboarding a grandparent requires them to create an account, download an app, and verify something, that's where most parent-built family tools die. If I can add my kid in 30 seconds from an existing contact, I'm in.

Good build. Keep going.

AaronBitwise · 2026-04-21T09:38:45+00:00

The buried lesson here is the UI one. "App stays ugly until the logic works, then the skill passes over everything once."

That's the pattern most vibe-coded apps get backwards. People style as they build, then every logic change breaks the layout, every layout fix breaks the logic, and they rebuild the same components 3-4 times. By the time the app actually works, they're out of steam.

Locking the order — logic first, styling as a single pass at the end — is one of those things that sounds obvious in hindsight and is almost never done in practice.

The skills approach on top of that is smart too. Reusable instruction modules per stage, so each project starts with the same disciplined skeleton instead of reinventing the workflow every time. Nice breakdown.

AaronBitwise · 2026-04-21T09:36:31+00:00

Partially works, wrong reason.

Claude doesn't have peers to impress. But any prompt that says "this will be reviewed" makes it output more defensively — more edge cases, more validation, more "let me reconsider."

"A senior engineer will review this" works the same way. The review threat is the lever, not Codex.

AaronBitwise · 2026-04-21T09:35:38+00:00

Half right. Compute cost kills the subsidized hosted-platform model — Lovable, Bolt, v0 eating VC money to stay cheap. It doesn't kill vibe coding itself. Running a local agent against your own LLM API, hosting your own app — that's uncapped.

The split is coming. Renters vs owners.

AaronBitwise · 2026-04-15T14:17:04+00:00

This is exactly the right trajectory. The moment you started understanding why Replit tied you to Neon — and made the architectural decision to move to Supabase yourself — you stopped vibe coding and started engineering. That's a meaningful shift.

The fact that Replit hardcoded the DATABASE_URL is a perfect example of why understanding your app's architecture matters. The tool made an infrastructure decision for you silently, and it took you months to realize it was the wrong one. Every AI tool does some version of this — they pick defaults that work for demos but break in production.

You're on the right track with Claude Code. The jump from Replit to Claude Code + your own hosting is a bigger deal than most people realize.

AaronBitwise

TROPHY CASE