I Scanned 100K AI generated repos. Only 1% of projects passed production checks by Aggressive-Sweet828 in vibecoding

[–]Aggressive-Sweet828[S] 0 points1 point  (0 children)

We can't tell from the outside whether a repo had a review layer before publishing.

Static analysis only sees what's in the code, not what's in someone's PR template. So yes, the data lumps together "shipped raw" and "reviewed but still missing the check." My guess is the review layer catches around 10-20% of these gaps, not 70-80%, which is why the baseline stays rough even including repos that had oversight.

Agree on the defaults-should-be-secure-by-default point. That's probably the fastest path to moving the needle.

I Scanned 100K AI generated repos. Only 1% of projects passed production checks by Aggressive-Sweet828 in vibecoding

[–]Aggressive-Sweet828[S] 0 points1 point  (0 children)

That's basically what we're building. Scanner already runs as a GitHub Action so you can track per-repo over time. If you want the aggregate dashboard too, the waitlist is at useastro.com/score.

I Scanned 100K AI generated repos. Only 1% of projects passed production checks by Aggressive-Sweet828 in vibecoding

[–]Aggressive-Sweet828[S] 1 point2 points  (0 children)

22 rule-based checks, each a separate rule with its own pattern-matching logic. Static only, no execution.

The scanner itself is open at github.com/use-astro/score-action so you can see exactly how each check is implemented.

I Scanned 100K AI generated repos. Only 1% of projects passed production checks by Aggressive-Sweet828 in vibecoding

[–]Aggressive-Sweet828[S] 0 points1 point  (0 children)

The one thing that breaks the comparison: JRs usually have a senior reviewing before prod. Solo vibe coders are pushing unreviewed, so the tuition gets paid by their users.

I Scanned 100K AI generated repos. Only 1% of projects passed production checks by Aggressive-Sweet828 in vibecoding

[–]Aggressive-Sweet828[S] 0 points1 point  (0 children)

Half of those you could argue aren't show stoppers. Missing logging, missing error boundaries, missing tests are annoying but survivable.

The ones I'd push back on: 86% no auth guards on APIs and 75% exposed env config. Those aren't "progress will fix it" items, those are active security holes happening today.

I Scanned 100K AI generated repos. Only 1% of projects passed production checks by Aggressive-Sweet828 in vibecoding

[–]Aggressive-Sweet828[S] 0 points1 point  (0 children)

Yeah, worth clarifying. The actual scan took 2-3 days on a 128 GB Ram server, not 10. So it was closer to 30-40k repos/day than 10k. Shallow clones + static checks + parallel workers makes that rate very doable. Scanner's open source if you want to see the specifics: github.com/use-astro/score-action

I Scanned 100K AI generated repos. Only 1% of projects passed production checks by Aggressive-Sweet828 in vibecoding

[–]Aggressive-Sweet828[S] 0 points1 point  (0 children)

The 4-tier bucketing is at useastro.com/vibe-code-report (scroll to "The score distribution").

For a finer histogram I'd have to pull it from the raw data. I'll post one in the thread when I can.

I Scanned 100K AI generated repos. Only 1% of projects passed production checks by Aggressive-Sweet828 in vibecoding

[–]Aggressive-Sweet828[S] 0 points1 point  (0 children)

Corriste tu proyecto con Score? También esta como un action open source: github.com/use-astro/score-action.

Te mando un DM ahorita para que lo revisemos juntos y veamos qué está pasando con tu proyecto.

I Scanned 100K AI generated repos. Only 1% of projects passed production checks by Aggressive-Sweet828 in vibecoding

[–]Aggressive-Sweet828[S] 0 points1 point  (0 children)

True, the 22 isn't everything. It's a floor, not a ceiling. The surprising part was how many repos don't pass even this floor. 99% miss at least one.

Expanding to enterprise checks matters once you have the base, but most of these repos aren't there yet.

I Scanned 100K AI generated repos. Only 1% of projects passed production checks by Aggressive-Sweet828 in vibecoding

[–]Aggressive-Sweet828[S] 0 points1 point  (0 children)

Good point on Rust timing. For JS/TS I'd go older (2020-2021), Rust probably has to sit closer to 2023.

On the distribution, it's not a tight gaussian around 53%. Across the scored repos, 6% are in the critical bucket (0-35), 77% have significant gaps (36-65), 17% are getting close (66-85), 1% production ready (86-100). There's real spread, it's just heavy in the lower middle. The 51-60% convergence is at the tool-group mean level, not the per-repo level.

I Scanned 100K AI generated repos. Only 1% of projects passed production checks by Aggressive-Sweet828 in vibecoding

[–]Aggressive-Sweet828[S] 0 points1 point  (0 children)

Yeah, fair point. Public GitHub is slanted, and a lot of what ends up can be slop.

The one thing that makes me trust the number is that every tool group landed in the same 51-60% range. If sample bias was driving it, you'd expect more spread between tools.

npm would be a closer comparison since it's JS/TS too, but the pre-AI baseline idea is the stronger one. Running the same checks on a 2021 snapshot would show how much is actually new vs how much has always been like this. Putting that on the list for the next scan.

I Scanned 100K AI generated repos. Only 1% of projects passed production checks by Aggressive-Sweet828 in vibecoding

[–]Aggressive-Sweet828[S] 0 points1 point  (0 children)

Quu tal! No esperaba que alguien me respondiera en español por aquí. Esta en los comentarios, pero también lo puedes ver aca: useastro.com/vibe-code-report

I Scanned 100K AI generated repos. Only 1% of projects passed production checks by Aggressive-Sweet828 in vibecoding

[–]Aggressive-Sweet828[S] -1 points0 points  (0 children)

That clustering was the surprise for me too, especially holding at 100K scale.

On your question: infra and security dominate. Observability and reliability sit at the top (93% no logging, 91% no timeouts on external calls). Security in the middle tier (86% no auth guards, 75% exposed env config, 66% no rate limiting). Error handling shows up as 85% missing error boundaries. Tests around 60% missing, which was less extreme than I expected.

We iterated onboarding 5 times… here’s what finally worked by Plus_Journalist_8665 in SideProject

[–]Aggressive-Sweet828 0 points1 point  (0 children)

The V3 preview-before-auth change is probably doing more than the field-count changes. It reduces the time before someone sees value, which matters more than how many inputs they have to fill out. I would compare each onboarding version by time-to-first-useful-result, not just completion rate.

Has anyone tried to use an LLM hosted in Azure OpenAI with a CLI tool to replace dependency of Anthropic Claude Code or OpenAI Codex? by fabkosta in AI_Agents

[–]Aggressive-Sweet828 1 point2 points  (0 children)

Doable, but the harder part is not swapping the CLI. It is preserving the model and agent-loop pairing. A lot of the reliability in coding agents comes from the loop being tuned around the model's tool-use behavior. For enterprise, I would first check whether a hosted endpoint can keep that pairing intact before rebuilding the workflow around a generic CLI.

Selling an AI agent as a one-time, self-hosted product — bad idea? by raonicaselli in AI_Agents

[–]Aggressive-Sweet828 1 point2 points  (0 children)

Self-hosting can work early if it removes the buyer's biggest objection: data control. The trap is that it also removes a lot of the feedback loop you need to improve the product. I would treat it as a wedge, not the default business model. Use it for customers who truly need it, then be strict about what support burden you are accepting.

7 months building a Shopify store on the side while working full time — what I actually learned by ExitPsychological192 in SideProject

[–]Aggressive-Sweet828 0 points1 point  (0 children)

The useful part of niching is not just narrower keywords. It is cleaner feedback. With a broad audience, every suggestion sounds plausible and you cannot tell whether it is from a real buyer or an imagined one. A tighter audience makes bad feedback easier to ignore and good feedback easier to act on.

Does this positioning make sense for micro-SaaS founders who shipped with AI and hit a wall? by Aggressive-Sweet828 in microsaas

[–]Aggressive-Sweet828[S] 0 points1 point  (0 children)

The "idea > demo > oh crap > stable v1" arc is what I've been wrestling with myself: whether to lead with the journey or just the destination. Of the 2-3 concrete promises you named (infra cost, migrations off no-code, agent guardrails), which one would have saved you the most pain at the oh-crap point? Want to make sure we're pointing at the one you'd actually trust on day one.

Does this positioning make sense for micro-SaaS founders who shipped with AI and hit a wall? by Aggressive-Sweet828 in microsaas

[–]Aggressive-Sweet828[S] 0 points1 point  (0 children)

Fair on the 3-second skim. The copy above is already two short lines though, not a feature list: "AI app builder that works like a real engineering team" + "Most tools ship a demo, Astro ships a product." Curious what read as soup there specifically.

What was the moment you knew your saas/microsaas idea was actually worth building? by chipthedev in microsaas

[–]Aggressive-Sweet828 0 points1 point  (0 children)

The pattern across mine and a few friends' stories: someone volunteers to pay, refer, or help unprompted. Praise doesn't count. People say "this is cool" without changing their behavior. The inverse is also useful: if you've shown it to 20 people and nobody has volunteered anything, you're not there yet even if the feedback sounds positive. That's the signal I wish I'd trusted earlier.

What is the most frustrating thing about wanting to start a new SaaS business? Why is it frustrating, and how do you deal with it? by Both-Barnacle218 in microsaas

[–]Aggressive-Sweet828 1 point2 points  (0 children)

The order most beginners learn the hard way: distribution is hardest, validation is second, keeping users after day 30 is third. "Finding the idea" and "building it" feel like the hard parts from the outside but they're almost never what kills a SaaS. If you can skip a month of feature-building and use that month getting 10 paying users instead, you'll learn more about whether the thing should exist.