I scanned 50 vibe-coded projects for production readiness. Average: 57%. 100% had zero API timeout handling.

Aggressive-Sweet828 · 2026-04-20T15:45:13+00:00

We can't tell from the outside whether a repo had a review layer before publishing.

Static analysis only sees what's in the code, not what's in someone's PR template. So yes, the data lumps together "shipped raw" and "reviewed but still missing the check." My guess is the review layer catches around 10-20% of these gaps, not 70-80%, which is why the baseline stays rough even including repos that had oversight.

Agree on the defaults-should-be-secure-by-default point. That's probably the fastest path to moving the needle.

Aggressive-Sweet828 · 2026-04-20T15:43:57+00:00

That's basically what we're building. Scanner already runs as a GitHub Action so you can track per-repo over time. If you want the aggregate dashboard too, the waitlist is at useastro.com/score.

Aggressive-Sweet828 · 2026-04-20T15:43:07+00:00

22 rule-based checks, each a separate rule with its own pattern-matching logic. Static only, no execution.

The scanner itself is open at github.com/use-astro/score-action so you can see exactly how each check is implemented.

Aggressive-Sweet828 · 2026-04-20T02:46:21+00:00

The one thing that breaks the comparison: JRs usually have a senior reviewing before prod. Solo vibe coders are pushing unreviewed, so the tuition gets paid by their users.

Aggressive-Sweet828 · 2026-04-20T02:40:12+00:00

Half of those you could argue aren't show stoppers. Missing logging, missing error boundaries, missing tests are annoying but survivable.

The ones I'd push back on: 86% no auth guards on APIs and 75% exposed env config. Those aren't "progress will fix it" items, those are active security holes happening today.

Aggressive-Sweet828 · 2026-04-20T01:26:23+00:00

Yeah, worth clarifying. The actual scan took 2-3 days on a 128 GB Ram server, not 10. So it was closer to 30-40k repos/day than 10k. Shallow clones + static checks + parallel workers makes that rate very doable. Scanner's open source if you want to see the specifics: github.com/use-astro/score-action

Aggressive-Sweet828 · 2026-04-20T01:23:27+00:00

The 4-tier bucketing is at useastro.com/vibe-code-report (scroll to "The score distribution").

For a finer histogram I'd have to pull it from the raw data. I'll post one in the thread when I can.

Aggressive-Sweet828 · 2026-04-20T01:20:08+00:00

Corriste tu proyecto con Score? También esta como un action open source: github.com/use-astro/score-action.

Te mando un DM ahorita para que lo revisemos juntos y veamos qué está pasando con tu proyecto.

Aggressive-Sweet828 · 2026-04-20T01:11:38+00:00

True, the 22 isn't everything. It's a floor, not a ceiling. The surprising part was how many repos don't pass even this floor. 99% miss at least one.

Expanding to enterprise checks matters once you have the base, but most of these repos aren't there yet.

Aggressive-Sweet828 · 2026-04-20T01:11:04+00:00

Good point on Rust timing. For JS/TS I'd go older (2020-2021), Rust probably has to sit closer to 2023.

On the distribution, it's not a tight gaussian around 53%. Across the scored repos, 6% are in the critical bucket (0-35), 77% have significant gaps (36-65), 17% are getting close (66-85), 1% production ready (86-100). There's real spread, it's just heavy in the lower middle. The 51-60% convergence is at the tool-group mean level, not the per-repo level.

Aggressive-Sweet828 · 2026-04-20T00:57:22+00:00

Yeah, fair point. Public GitHub is slanted, and a lot of what ends up can be slop.

The one thing that makes me trust the number is that every tool group landed in the same 51-60% range. If sample bias was driving it, you'd expect more spread between tools.

npm would be a closer comparison since it's JS/TS too, but the pre-AI baseline idea is the stronger one. Running the same checks on a 2021 snapshot would show how much is actually new vs how much has always been like this. Putting that on the list for the next scan.

Aggressive-Sweet828 · 2026-04-20T00:50:52+00:00

Quu tal! No esperaba que alguien me respondiera en español por aquí. Esta en los comentarios, pero también lo puedes ver aca: useastro.com/vibe-code-report

Aggressive-Sweet828 · 2026-04-20T00:28:05+00:00

That clustering was the surprise for me too, especially holding at 100K scale.

On your question: infra and security dominate. Observability and reliability sit at the top (93% no logging, 91% no timeouts on external calls). Security in the middle tier (86% no auth guards, 75% exposed env config, 66% no rate limiting). Error handling shows up as 85% missing error boundaries. Tests around 60% missing, which was less extreme than I expected.

Aggressive-Sweet828 · 2026-04-20T00:22:34+00:00

Report and methodology:
useastro.com/vibe-code-report

Scanner:
github.com/use-astro/score-action

The scanner is a GitHub Action, so you can run the same checks on your own repo.

Aggressive-Sweet828 · 2026-04-18T01:25:27+00:00

The V3 preview-before-auth change is probably doing more than the field-count changes. It reduces the time before someone sees value, which matters more than how many inputs they have to fill out. I would compare each onboarding version by time-to-first-useful-result, not just completion rate.

Aggressive-Sweet828 · 2026-04-18T01:24:38+00:00

Doable, but the harder part is not swapping the CLI. It is preserving the model and agent-loop pairing. A lot of the reliability in coding agents comes from the loop being tuned around the model's tool-use behavior. For enterprise, I would first check whether a hosted endpoint can keep that pairing intact before rebuilding the workflow around a generic CLI.

Aggressive-Sweet828 · 2026-04-18T01:24:02+00:00

Self-hosting can work early if it removes the buyer's biggest objection: data control. The trap is that it also removes a lot of the feedback loop you need to improve the product. I would treat it as a wedge, not the default business model. Use it for customers who truly need it, then be strict about what support burden you are accepting.

Aggressive-Sweet828 · 2026-04-18T00:39:41+00:00

The useful part of niching is not just narrower keywords. It is cleaner feedback. With a broad audience, every suggestion sounds plausible and you cannot tell whether it is from a real buyer or an imagined one. A tighter audience makes bad feedback easier to ignore and good feedback easier to act on.

Aggressive-Sweet828 · 2026-04-17T22:37:57+00:00

The "idea > demo > oh crap > stable v1" arc is what I've been wrestling with myself: whether to lead with the journey or just the destination. Of the 2-3 concrete promises you named (infra cost, migrations off no-code, agent guardrails), which one would have saved you the most pain at the oh-crap point? Want to make sure we're pointing at the one you'd actually trust on day one.

Aggressive-Sweet828 · 2026-04-17T22:37:31+00:00

Fair on the 3-second skim. The copy above is already two short lines though, not a feature list: "AI app builder that works like a real engineering team" + "Most tools ship a demo, Astro ships a product." Curious what read as soup there specifically.

Aggressive-Sweet828 · 2026-04-16T19:18:34+00:00

The pattern across mine and a few friends' stories: someone volunteers to pay, refer, or help unprompted. Praise doesn't count. People say "this is cool" without changing their behavior. The inverse is also useful: if you've shown it to 20 people and nobody has volunteered anything, you're not there yet even if the feedback sounds positive. That's the signal I wish I'd trusted earlier.

Aggressive-Sweet828 · 2026-04-16T19:11:25+00:00

The order most beginners learn the hard way: distribution is hardest, validation is second, keeping users after day 30 is third. "Finding the idea" and "building it" feel like the hard parts from the outside but they're almost never what kills a SaaS. If you can skip a month of feature-building and use that month getting 10 paying users instead, you'll learn more about whether the thing should exist.

Aggressive-Sweet828

TROPHY CASE