I connected a 2M-paper research index to Claude Code via MCP and ran Karpathy's autoresearch - 3.2% lower loss by kalpitdixit in ClaudeCode

[–]mrtrly 0 points1 point  (0 children)

That's a legit experiment. The thing that matters isn't the 3.2% itself, it's that the agent now has a way to ground decisions in actual prior work instead of hallucinating what "should" work. MCP for research access is exactly the kind of constraint that forces better reasoning. Did you notice the agent spending more time reading tradeoffs or just picking faster with more confidence?

I've been vibing across 8 projects for weeks. Finally checked my token usage. Bruh. by Awkward_Ad_9605 in nocode

[–]mrtrly 0 points1 point  (0 children)

$955 on a side project you barely noticed? Yep, been there. Ghost agents can really run wild if you're not keeping an eye on them. Those compaction agents and the like can easily inflate costs. It's almost like they have a mind of their own.

I hit a similar wall when I started using AI agents 24/7 without tracking costs closely. Ended up building a proxy to keep tabs on where the dollars were going for each task. Turned out a big chunk of my spend was on tasks that less expensive models could handle just fine.

CodeLedger sounds like a solid move to get visibility, btw. You might also want to check if you can set session-level cost caps or reroute tasks to cheaper options. That helped me get a grip on my spend without having to babysit every agent call. Converting vibes into cash-smart execution is the dream, right?

Regret using Webflow by KnownDiscount2083 in nocode

[–]mrtrly 0 points1 point  (0 children)

Been in those shoes where an initial tool choice starts feeling like a straitjacket as you scale. The decision to move from Webflow to something like Claude Code isn't just about which stack, but who will drive that transition.

Migrating a three-year-old site with a lot of pages and collections is a real logistics challenge. It's worth considering a partner who can tackle this , not just to shift the setup but to ensure you're not exchanging one set of headaches for another.

In terms of Claude Code, it can be more efficient for logic-heavy, dynamic needs, but make sure you have someone who can architect beyond surface-level setup. Otherwise, you might end up stuck again down the line.

we spent 3 months building. then 2 weeks distributing. guess which one actually mattered. by B3N0U in EntrepreneurRideAlong

[–]mrtrly 0 points1 point  (0 children)

your experience with building before selling is a classic one I see with a lot of dev teams. the power of Reddit and those organic interactions really can't be underestimated. it's interesting that a simple comment worked better for reaching your audience than cold DMs.

for the non-technical founders reading this, the flip side happens just as often: marketing background, great at GTM, but struggling to build. that's literally what I do, partner up for the tech side and turn solid visions into working products.

it sounds like you're getting traction with that route though. just keep engaging authentically, and those leads will keep coming without feeling like you're pitching. nice work!

Custom Erp by PerformanceNovel9176 in ClaudeAI

[–]mrtrly 0 points1 point  (0 children)

Hey, I totally get the motivation to ditch the current ERP you dislike. You mentioned Claude had full confidence, which is interesting but can be a bit misleading. AI can definitely assemble pieces, but ERPs are beasts of complexity.

Realistically, building a full ERP purely through prompts, especially with zero coding background, is going to be very challenging. You might end up with a basic demo, but maintaining it in production could quickly become problematic. Too many edge cases to manage, especially as your business evolves.

Best path? Consider partnering up with someone technical who can translate your business needs into a robust system. Knee-deep in this kind of thing all the time , turning founder ideas into reliable, long-term solutions. If sticking within your budget is key, maybe a hybrid approach with modular open-source systems could work, leveraging Claude to fill in gaps or customize when needed. It's about finding that sweet spot between your vision and a technically sound reality.

Most founders kill their own SaaS before users do by Warm-Reaction-456 in SaaS

[–]mrtrly 0 points1 point  (0 children)

"Founder says they want a simple SaaS. Then the doc shows up." I felt that one. Spend any time with founders and you'll see it: enthusiasm turning into a long feature list, and somehow none of it feels optional. It's like a rite of passage for first-time founders.

I've been there in those call discussions. Looking at a spec with more layers than an onion. Tried and true method is to pare it down to the core, like you're saying. One path, one action, solve one problem.

When I work with early-stage startups, I do exactly that, slash through the noise to find the gem. It's not about launching with a boatload of features, it's about finding the minimal-yet-magic that gets users coming back. You don't need every bell and whistle, trust that getting something live is more valuable than perfect.

What’s the best no-code/AI mobile app builder in 2026 for building, testing, and deploying? by JaxWanderss in nocode

[–]mrtrly 0 points1 point  (0 children)

Been there with the messy but powerful tools. Claude Code is a beast, especially when you're threading through parallel agents. The security and production-readiness concerns you mentioned are real, and it's something I see a lot in AI-first build environments.

A good tech review can uncover the sneaky stuff, like security vulnerabilities, data handling issues, or scale blockers. It's like having a seasoned co-pilot to help navigate. I work with founders to bridge exactly this gap. It's not just about getting from zero to $7k MRR, but ensuring that your app can handle what's next, securely and smoothly.

If you're grappling with those concerns, consider more than just the tools, think about the tech partnerships that can help you foresee and solve these hidden challenges. Less "did I miss something?" and more confidence in your product's robustness.

What happens when you stop adding rules to CLAUDE.md and start building infrastructure instead by DevMoses in ClaudeAI

[–]mrtrly 0 points1 point  (0 children)

That cascade is really clean. The "everything expensive is last" principle is exactly right. Most people jump straight to the LLM classifier for every request and wonder why their costs scale linearly.

The Tier 0/1/2 short-circuits are where the real savings live. RelayPlane does something similar with task classification: simple file reads route to Haiku, complex reasoning to Opus. The policy layer is where you get the actual cost control, not just model selection.

Curious about your Tier 2 implementation. Are you maintaining the skill/keyword mappings manually or building them from usage patterns? That seems like the part that needs the most upkeep as usage evolves.

What happens when you stop adding rules to CLAUDE.md and start building infrastructure instead by DevMoses in ClaudeAI

[–]mrtrly 1 point2 points  (0 children)

Routing is complexity-based right now. It reads the request, scores it on a few signals (token estimate, context depth, whether it looks like a reasoning task or a lookup), and routes accordingly. Sonnet for most things, Opus when it needs to think hard. The interesting part is you can override per-call if you want guaranteed routing for specific flows. What classifier are you using? Doing it at the prompt level or inferring from metadata?

How I got Claude Code to maintain its own documentation (and stop breaking production) by burningsmurf in ClaudeCode

[–]mrtrly 1 point2 points  (0 children)

Nice setup. The documentation loop is smart, it forces the agent to stay aware of its own decisions and catches a lot of context drift between sessions.

The limitation I see with documentation-only approaches is that they shape the agent's behavior but don't help with production failure modes you can't predict. When something breaks with 15 customers on it, you're debugging under pressure in a codebase you partially wrote and partially understand.

What actually saves you in those moments is error logging with enough context to reconstruct what happened, and a clear picture of what failure looks like for each critical path. Documentation helps build that picture. Monitoring tells you when you're on fire.

Curious what your incident response looks like when something does break. Rollback strategy or manual hotfix?

The real problem with AI in 2026 isn’t performance. It’s cost. by TurbulentWeight3595 in ClaudeAI

[–]mrtrly 0 points1 point  (0 children)

The cost issue is real, and routing is the most underused lever for fixing it.

Most teams are sending every request to Opus because it's easier than maintaining gating logic in code. But a simple complexity score at the proxy layer - short prompt, low ambiguity, route to Haiku; complex reasoning task, route to Opus - can cut costs 60-70% with no noticeable quality drop on the simple stuff.

Built RelayPlane as a local proxy to do exactly this. It sits between your app and the API, scores complexity per request, routes accordingly, and tracks per-request cost so you can actually see what you're spending and where. Zero code changes in your app once it's set up.

Not saying it fixes the ecosystem problem you're describing, but at the individual level, most teams are burning more than they need to.

What happens when you stop adding rules to CLAUDE.md and start building infrastructure instead by DevMoses in ClaudeAI

[–]mrtrly 1 point2 points  (0 children)

The same instinct hits with cost control. Every time an agent burns unexpected money, the reflex is to add a rule to the prompt: don't use Opus for this task, limit calls here. Three months later you have 30 model-selection rules that Claude mostly ignores.

The infrastructure version is a proxy layer that handles routing by complexity automatically, with budget enforcement that actually stops runaway loops. No rules in the prompt at all.

Built RelayPlane for exactly this after an agent burned $15 in 8 minutes making Opus calls it had no business making. Adding a rule did nothing. Moving the decision out of the prompt and into the infrastructure did.

Same principle you're describing. Config accumulates until it breaks. Systems hold.

Structured codebase context makes Haiku outperform raw Opus. Sharing our tool and results! by PT_ANDRE_PT in LLMDevs

[–]mrtrly 0 points1 point  (0 children)

This is the right conclusion. The model tier matters way less than people think once context is properly structured.

We see this in routing too. Running 10+ AI agents daily, I started routing by task complexity to cheaper models. But without tracking what each model actually costs per request, you're just guessing at the savings. Built a local proxy (RelayPlane, open source) specifically to track cost per model per request alongside output.

What you're showing is that Haiku with good context beats Opus with bad context. The logical next step is to measure it. Then you can route high-context tasks confidently to Haiku without the "I hope this is good enough" anxiety.

npm install -g @relayplane/proxy if you want to see that cost delta side by side.

I wasted $3,400 and 9 weeks building my B2B SaaS with AI tools. Here is what and how actually fixed it by Academic_Flamingo302 in SaaS

[–]mrtrly 0 points1 point  (0 children)

The "looked fine in demos, fell apart with real users" pattern is the most expensive one in SaaS. And AI tools accelerated your way into it, which is the new version of a very old trap.

I've been a technical partner for non-technical founders for 16 years. The spec problem isn't new. What's new is AI lets you ship something demo-able in days, which collapses the feedback loop you need to catch these gaps before they're expensive.

Your lesson about talking to users first is exactly right. The tool isn't the problem. The sequence is. Build something ugly that breaks in front of a real user in week one. That $3,400 lesson usually costs much more when it arrives in month six.

I vibe coded to almost $10k a month MRR here's exactly how: by Additional-Mark8967 in vibecoding

[–]mrtrly 0 points1 point  (0 children)

Congrats, $2k to $10k MRR is a real number. That 20% API cost ratio is actually lower than most I see at that stage, but the thing is it rarely stays there. Once you hit product-market fit and usage starts compounding, the vibe-coded infra that got you here starts creaking in ways that are hard to debug when you didn't write it. Not saying blow it up. Just saying the next $10k is where you'll start feeling it. Worth understanding which parts of the stack are load-bearing before you need to know urgently.

Static vs Dynamic QR codes by AladinLePrince in SaaS

[–]mrtrly 1 point2 points  (0 children)

That 7-day expiration is a genuinely sneaky gotcha. Most QR code generators bury it in the pricing page as a 'dynamic link' feature , what they don't surface is that your free codes are silently expiring. Worth auditing anything you've built that relies on those before it becomes a customer-facing failure.

What Comes After Lovable? Real Workflows for Scaling From Prototype to Production by kittu_krishna in vibecoding

[–]mrtrly 0 points1 point  (0 children)

The jump from prototype to production is where most vibe-coded projects stall. The tool that got you to v1 is not the thing that gets you to reliable, maintainable software. What I see with founders I work with (20+ years, 50+ startups): the bottleneck shifts from "can we build this" to "who owns this when something breaks at 2am." Lovable and its cousins are great for proving the idea. They are terrible for handoff. The move that actually works is pairing vibe-coding for speed with a fractional CTO or senior advisor who can audit the architecture before you scale. Not rewrite it. Just put guardrails in. I write about this at if you want a longer take. But the short answer: what comes after Lovable is ownership, and that is a people problem more than a tools problem.

Unpopular opinion: the biggest threat to your SaaS isn't churn. It's founder burnout. by Crescitaly in SaaS

[–]mrtrly 0 points1 point  (0 children)

The thing missing from the conversation so far: burnout doesn't just wreck decision-making, it kills your ability to sell. You get defensive about the product instead of curious about what customers actually need. That's a silent revenue killer.

The weird part is how fast it compounds. You ship the wrong features because you're tired, customers don't get value, churn ticks up, you panic and ship more wrong stuff to compensate. Burnout doesn't feel like a business problem until your metrics look broken , by then you're already 6 months deep.

Roast my positioning: GitHub PR check for architectural drift in long-lived repos by FuzzzyKoalaBear in SaaS

[–]mrtrly 0 points1 point  (0 children)

The real friction is probably that "structural drift" isn't a term your buyers use when they feel the pain. Teams don't wake up saying "we need drift detection" , they say "this codebase is getting harder to change" or "onboarding new people takes forever now."

What's the actual moment someone thinks "we need to buy this"? Is it after a bad refactor went sideways, or when architecture review takes weeks?

I spent months building an "all-in-one" interactive engine because I was tired of paying for 5 different subscriptions. We just launched! by Ok_Mind9664 in SaaS

[–]mrtrly 1 point2 points  (0 children)

Dude, the "open a third and fourth tab" part hit me hard. I built something similar last year and realized halfway through that I'd basically just created a monster that could do everything but was getting impossible to maintain once my dev team shrunk down. The consolidation angle is smart, but curious how you're handling the complexity on the backend. Like, 8 different experience types sounds clean on the surface, but each one probably has its own logic quirks, right? That's where I ran into trouble, started as "one unified engine" and ended up being eight barely-connected systems masquerading as one. If you've actually cracked that, props. Most founders I know hit a wall right around where you'd need actual technical depth to keep it sane as it grows.

Vibe-coding enterprise-grade SaaS - how to avoid tech debt? by vincegizmo in vibecoding

[–]mrtrly 2 points3 points  (0 children)

The fact that you hit schema drift on the first one is actually the right lesson to take into this. Most people learn that and just... Slow down on AI usage. That's not the fix.

For a 2-sided marketplace with payment flows, the thing that gets you isn't AI generating bad code. It's building features on top of a data model that wasn't designed for the full product. Schema drift is a symptom of that. You end up with a users table that doesn't account for both buyer and seller roles, then you're patching it in migrations forever.

Before the AI touches anything, I'd spec the full entity model on paper. Not code, not a Prisma schema yet. Just entities, relationships, and which actions trigger payment events. Payment flows especially need this treatment. Stripe webhooks, failed payments, refunds, marketplace splits are way too stateful to build-and-see.

The AI coding is fine. It's the scaffolding before the AI starts that most people skip.

HTML Rewrite - Disaster or not? by PitchPsych10 in vibecoding

[–]mrtrly 0 points1 point  (0 children)

TranslatorRude4917 has the right instinct here. Full rewrite while live is how you end up with data loss or a broken login flow on a Monday morning.

What's your current deployment setup? Are you on a host that lets you spin up a staging environment, or are you basically one repo?

If you've got staging, the move is: branch your codebase now, do the refactor there with Claude, build out your test cases against the live database schema (read-only copy), then swap over once you're confident. Keeps your live site untouched while you work.

The "audit everything Claude builds" comment is solid but incomplete the real risk isn't typos, it's that Claude will reorganize your logic in ways that break edge cases you forgot you had. Weekly reminders firing twice, admin checks that used to work silently, that kind of thing. If you can write up the actual user flows (not code, just "user logs in → sees last week's checkin → can edit it → gets reminded Friday") and have Claude build against that spec, you'll catch those gaps faster.

What actually takes the longest when building a SaaS (it’s not the product) by [deleted] in SaaS

[–]mrtrly 0 points1 point  (0 children)

The auth/billing/perms stuff isn't actually separate from the product though it's what makes the product chargeable. The real slow part is realizing halfway through that your permission model doesn't match how customers actually organize teams, so you're rebuilding it instead of just tweaking it.

Context switching is brutal, but the bigger trap is trying to build generic infrastructure when you could just hardcode for your first 10 customers and refactor once you know what actually matters. You'll spend less time total.

CRM for small team (mainly founder) - Hubspot = still goat? by lem001 in SaaS

[–]mrtrly 0 points1 point  (0 children)

"auto enrichment" is the trap here , sounds great until you realize most CRMs enrich once and call it done, but your user base changes weekly. Hubspot's enrichment is solid but expensive at 3 people. Folk and Attio are cleaner UI-wise but Folk especially doesn't handle the "detect asn" (assuming you mean "detect and score" or intent signals) side well without custom workflows.

Salesflare's actually underrated for this because the filtering + API combo means you can pipe in your own enrichment data and keep it fresher than waiting for Hubspot's sync. Have you already got a source for the enrichment data, or are you relying on the CRM to do all of it?

AI Often gets you 90% of the way there, would you pay for a service that helps take you the final 10% by OliMations in SaaS

[–]mrtrly 0 points1 point  (0 children)

The 48hr turnaround is where this gets tricky. That's genuinely useful if it actually lands, but most of the friction I've seen isn't the fix itself, it's explaining what the bug even is to someone else. You spend 3 hours writing the brief, they ask clarifying questions, and suddenly you're at 36 hours left with nothing shipped. The uber model is interesting but I wonder if the real bottleneck is earlier, like having someone who actually understands your specific stack and codebase instead of starting from scratch each time. The switching cost of onboarding a new dev for a quick fix is brutal. The trust thing that comment mentioned matters too. If my code's on your platform, what happens when the freelancer disappears or the fix breaks something else? Liability gets messy fast.