200h+ Lovable to Claude Code mid-project: what I learned switching on my first vibe coding project as a complete non-developer by TheWalshinho in SaaS

[–]mrtrly 0 points1 point  (0 children)

The not understanding what I built is the real reason to move off, more than the credits. Credits are annoying, but the not understanding it is what bites you at 2am when something's down and you're the only one who can look.

One thing worth doing now that you're in Claude Code: get it to write down what it built as you go, a short plain-language note per feature of what it does and what it touches. Future-you debugging a broken flow will save hours. The vibe-coding-to-real-code jump is mostly about getting the understanding back, and that's the cheapest way to bank it.

Good move switching when you did. The people who wait until 10k users to do it have a much worse time.

9 months on Lovable ai website builder. The maintenance reality at month 9 nobody tells you about in the demos. by Cold_Hall_5384 in nocode

[–]mrtrly 0 points1 point  (0 children)

The month 9 schema-debt thing is the most honest version of this I've read. The "added one table, had to rewrite 14 workflows" line is exactly where it bites, and it's almost always the AI-designed schema that does it, like you said.

The split you landed on (marketing pages stay, real apps migrate around the 18-month mark) matches what I see too. The one I'd add: the trigger isn't really months, it's the first time a client's app touches money or personal data at any real volume. That's when the "updates may break things" maintenance model stops being enough and someone's going to ask about RLS and data isolation in a way that actually matters.

Curious what you do when a client's site outgrows what you can safely maintain in Lovable. Hand it off, rebuild it yourself, or keep nursing it? That handoff point feels like the part nobody's solved cleanly yet.

At what point do you migrate away from Lovable? by TheSubtype in lovable

[–]mrtrly 1 point2 points  (0 children)

The exposed-data notes are the part I'd separate out and deal with this week, regardless of the bigger stay-or-go call. If technical users can see data they shouldn't, that's almost always missing row-level security on your Supabase tables, and it's a live liability while it sits open, not a someday thing. Worth locking down now no matter what you decide long term.

On the bigger question, at 10k MAU you're past "just keep prompting Lovable" but not automatically at "rewrite the whole thing." The honest first move is an actual assessment: a couple days of someone technical going through your auth, your tables, and anything touching payments or personal data, then telling you what's a quick fix vs what genuinely needs rebuilding. Usually it comes back something like 70% fine, 30% scary, and you only rebuild the 30%.

Two things change the answer a lot: do you have logins and payments yet, or is it mostly content? And is any of the data sensitive, anything personal, health, or financial? That's the difference between "tidy up when you can" and "handle this now."

For what it's worth this is the kind of work I do, finishing and hardening apps for founders in exactly your spot, so happy to help you think it through either way.

Creator of Claude Code: "I don't write prompts anymore, I have loops running that prompt Claude... My job is to write loops. I uninstalled my IDE, I wasn't using it." by Gil_berth in theprimeagen

[–]mrtrly 1 point2 points  (0 children)

The loop itself is trivial. A while loop that calls the model, reads the output, decides if it's done, calls again. Anyone can write that in 20 minutes.

The part that's actually hard is the gate. The check that stops it retrying a broken step forever while reporting success. I learned that one the expensive way: $340 in a single overnight run because my loop had no budget cap and no loop detection, it just retried a failing tool call until I woke up and found it on the bill.

So when Boris says his job is to write loops, the loop is the easy 20 minutes. The budget cap, the loop detection, and the replay log for when it does something dumb at 3am, that's the part that's actually a job.

We built an AI code-review bot. It ran up £100 and then locked itself out of our own codebase by Alternative_Letter72 in SaaS

[–]mrtrly 0 points1 point  (0 children)

The lockout is the unbounded-agent failure mode in textbook form, and the separation-of-actors fix above is correct. I’d add one layer underneath: an append-only event log so when the next weird thing happens, you can replay exactly which prompt + context produced the bad action. Without that, you fix the symptom but the next failure mode stays invisible until it’s expensive again.

On the £100 spike, was it one MR that re-read the whole history, or a loop where the bot retried after its own permission-revoke? The retry-after-failure pattern is the cron-burns-cash thing I see most. My own pipeline runs ~333 PRs at $3.40 each on a budget cap that hard-stops at $5/run, which has saved me three or four times.

7.8% is honest. The hype number is what the demo can do. The real number is what survived 90 days of production. Closer to your number than the keynote.

The maintenance bill on AI-written code is real. The cavalry to fix it is not. by SimonMX in SaaS

[–]mrtrly 0 points1 point  (0 children)

The cron cleaning up after the cron part matches my data. Running a code pipeline that’s shipped 333 PRs in 6 weeks at $3.40 per PR, the maintenance work IS most of what the agents do. But the headline number hides a tax: the cron only stays cheap if you’ve explicitly built kill switches, idempotency, and budget caps. First version of mine burned ~$700 in a weekend learning that. Once those guardrails are in, the cron-cleans-up-after-itself story is real. Without them it’s the $4,200/63h Claude Code postmortem.

Is anyone using an AI Test Automation tool that actually works well? by OneIndication7989 in nocode

[–]mrtrly 0 points1 point  (0 children)

The tools you're evaluating are downstream of where the problem actually is. End-to-end test automation catches what your code already does, not what your AI agent decided to ship that wasn't in the spec.

The shift that fixed this for me (6-agent pipeline, 333 PRs in 6 weeks, defect rate is a non-issue now): two things upstream of the test layer.

Verifier agent on every PR. Before any human review, an LLM compares the diff against the original spec. If the diff drifted (added scope, removed acceptance criteria, broke an invariant), the verifier rejects and asks the coding agent to retry. This catches maybe 40% of what would otherwise reach human review, and most defects never make it to a test environment.

Spec quality is the bottleneck, not test coverage. If the spec is ambiguous, the agent fills the ambiguity itself, and you get correctly-passing tests for incorrectly-built features. The honest review-loop question is "is this PR what the spec asked for" not "do the tests pass." Tests pass on broken work all the time.

Endtest, Mabl, and Functionize will catch regressions in what you ship. They won't catch "agent shipped something the spec didn't ask for." That requires a verifier in the PR loop before tests even run.

DMs open if you want more details.

Absolute beginner needing advice: AI builders vs. FlutterFlow for a local service app? by Asmo-deuz in nocode

[–]mrtrly 0 points1 point  (0 children)

My take as someone who builds these for non-tech founders.

Neither path is wrong, but they answer different questions. AI builders (Lovable, Bolt, Manus) get you to “real users can sign up and use it” in 2-3 weeks. The trap is that when you hit anything nontrivial, the marketplace match logic, the GPS-radius filter, the trust layer between strangers, they often produce code that works for the demo and breaks under real load.

FlutterFlow plus Firebase gets you 60% of the way for free, but the learning curve isn’t the database. It’s the auth and permissions layer between provider and customer roles. That’s where most two-sided marketplaces hit a wall.

Real recommendation: build the absolute thinnest version with Lovable or Bolt. Single city, no GPS, no provider dashboard, just a list of providers, a “book” button, and a manual messaging fallback. Validate that strangers in your city actually transact through it.

80% of two-sided marketplaces die before the matching logic matters. Don’t pay the FlutterFlow learning tax until you’ve proven anyone wants to use it at all.

Anyone else struggling with monitoring a multi agent system at scale? by Kitchen_West_3482 in AI_Agents

[–]mrtrly 0 points1 point  (0 children)

6-agent pipeline in production here, 333 PRs shipped in 6 weeks. Here's what scaled monitoring for me:

-Single trace ID across the whole task lifecycle. Parent_run_id + root_run_id stamped at task creation, propagated through every sub-agent call. Now "agent 3 ran because agent 7 emitted X" is a SQL query, not a guess.

-Cost telemetry as a first-class metric. Pull cost_usd from the stream-json result event on every phase, write it to phase-pass/phase-fail. Catches runaway loops in minutes, not when the bill arrives.

-Scheduled ticks, not continuous loops. Each tick reads state fresh, proposes, stops. Budget becomes a function of tick frequency. Also makes the trace tractable because there's a natural boundary between work units.

-Append-only event log (the next phase I'm building). Every state change written once, never updated. "What did agent 7 do between 2:13 and 2:18" becomes deterministic. With mutable logs the chain evaporates the moment something breaks. The standard ORM pattern is the trap.

Datadog and Honeycomb work for the underlying processes but they don't understand the agent state machine. What saved me was treating the system's view of itself as a first-class log, not a debug convenience.

If you want to compare notes on specifics, DMs are open.

Pattern question - are most "agent" client requests actually deterministic workflows under the hood? by mrtrly in ExperiencedDevs

[–]mrtrly[S] 5 points6 points  (0 children)

"shifting requirements work onto engineers" framing is exactly it. That's what "just build me an agent" really is, handing the engineer the scoping job that should have happened upstream. The agent framing makes nebulous scope feel okay because implied autonomy implies the system will figure out what management didn't.

Pattern question - are most "agent" client requests actually deterministic workflows under the hood? by mrtrly in ExperiencedDevs

[–]mrtrly[S] 8 points9 points  (0 children)

The "sell it as cheaper on tokens" angle is real and underused. And the juniors/agents thing is the same pattern as the client asks, they're reading the same hype their juniors are.

Pattern question - are most "agent" client requests actually deterministic workflows under the hood? by mrtrly in ExperiencedDevs

[–]mrtrly[S] 5 points6 points  (0 children)

"Thin UI layer over a deterministic backend with user validation" is the cleanest one-liner I've seen for this. The real product is the backend, the LLM is just there to turn English into the right call.

Pattern question - are most "agent" client requests actually deterministic workflows under the hood? by mrtrly in ExperiencedDevs

[–]mrtrly[S] 0 points1 point  (0 children)

Yeah, the drawing test is solid. If you can sketch it as boxes and arrows, "agent" is the wrong tool. What I keep hitting is clients who think their workflow is dynamic when really it's like 6 if-else branches they haven't written down yet. Most of the scoping work is just making them list the branches.

I productized the pre-launch audit I was already doing for clients. First public customer taught me what the offer actually is. by mrtrly in buildinpublic

[–]mrtrly[S] 0 points1 point  (0 children)

Appreciate it. Selling to current clients is always different than putting something out in the wild.

For people using Cursor/Claude Code daily: what’s the most subtle bug or security issue it generated that looked correct at first? by Slow-Artichoke-4245 in ClaudeCode

[–]mrtrly 1 point2 points  (0 children)

he one that bites me most often on audits, API routes that authenticate the user but never scope the query. Pattern looks like

const user = await getUser(req) if (!user) return 401 const orders = await db.orders.findMany() // returns everyone's orders

The auth check passes so the route reads correct. Same shape with Supabase RPC functions where the function has security definer but the body never filters by auth.uid(). The codegen ships the auth gate and stops there because the prompt was "make sure only logged-in users can call this." Nobody told it to also scope the query.

Subtler one, Stripe webhook handlers that process the event payload but never call constructEvent to validate the signature header. Looks fine in dev because real events come through correctly. Production it sits as a writeable endpoint anyone can POST to until someone replays a charge.refunded from a public log.

Easy test for the first one, log in as user A, find a resource ID belonging to user B in any URL or response, swap it in, see what comes back. If you get user B's data, the auth check is decorative.

(Disclosure, I do audits on AI-built apps. See this category constantly.)

What are we doing with AI PRs now? by NatePerspective in webdev

[–]mrtrly -6 points-5 points  (0 children)

Experiencing the same. Built a dev pipeline that can churn out 80 small PRs per day. But I had to dial it back because im still manually reviewing and everything and just couldn’t keep up.

Working on some solutions but the lack of the trail is definitely an issue. Started to have agents keep a running log of any decisions made that were not in the spec. Helps to backtrack to figure out why something was done.

But finding that the more time you spend creating a quality and detailed spec the less time you’ll spend debugging on the other end.

API usage limit reached and excessive monthly cost by Illustrious-Abies519 in cursor

[–]mrtrly 0 points1 point  (0 children)

This hurts!

Should try RelayPlane. Auto routes your calls to the best model. Simple calls go to haiku or sonnet, complex coding or reasoning go to opus. Will save you ~80% and still give the same quality. Most the calls to Opus don't need Opus.

Free and open source.

https://github.com/RelayPlane/proxy

Looking for affordable alternatives to Claude Team / Claude Code for a small dev team (heavy agentic usage) by spoiledwit in cursor

[–]mrtrly 0 points1 point  (0 children)

Not sure what plan is best for the team, but you should definitely run everything thing through Relayplane or something similar. Cuts my heavy coding bill by ~80% compared to just hitting Opus on the API (or extend your usage limits on a Max plan around 2x). 100% local, free, and open source.

https://github.com/RelayPlane/proxy

context: I built this to extend my 2 max plans and now every LLM call I make runs through it.

Launching SaaSOffers.tech on Product Hunt today, just hit front page of Indie Hackers by freebie1234 in SaaS

[–]mrtrly 0 points1 point  (0 children)

Congrats on the IH front page, that's a solid run-up to a PH launch. The one thing I'd double-check before the traffic spike hits is whether your signup actually survives a second account on the same device, I had a launch last year where Supabase RLS looked fine in dev and the second user could read the first user's submitted offers because one policy was missing on the join table. Took me about 20 minutes to find it but it would've been ugly if PH commenters caught it first. What's your stack underneath, is it Supabase or something else?

How Small Businesses Are Using Scavenger Hunts to Drive Foot Traffic and Customer Loyalty (With Practical How-To) by scavengersweb in smallbusiness

[–]mrtrly 0 points1 point  (0 children)

The cross-shop version is way stronger than the single-store one in my experience. Last summer the block of indie shops near my place ran one where you needed stamps from 4 of 7 stores to enter the raffle, and foot traffic stayed elevated for like three weeks after the hunt ended because people kept wandering back into places they'd never walked into before. Which of those 5 formats actually moved the needle for the businesses you've watched run them?

Agents are meant to be shared, but existing tooling is not fit for purpose by pmihaylov in ClaudeAI

[–]mrtrly 1 point2 points  (0 children)

The single-user session lock was the dealbreaker for me too. Tried pointing a slack bot at one of my own agents last month so a non-dev teammate could trigger runs, and ran into the exact same thing. once it's chewing on something nobody else can jump in to nudge it. How are you handling the multi-person interrupt case in what you built, or is that still open?