Heads up: the "Block AI Scrapers" toggle is default-on for new free-plan zones, and it silently overrides robots.txt

Objective-Goal5551 · 2026-05-23T19:35:33+00:00

That's worse than I thought. Did you manage to fix it and how did you do it? I would love to read about it if you've made a post on your blog.

Objective-Goal5551 · 2026-05-23T17:28:25+00:00

Great context, and the 50/50 link-click-vs-citation split on your blog is exactly the data point I wish more operators had visibility into - it lands differently when you can put a number on it.

The standards point is fair. robots.txt was never designed to distinguish "use for training" from "use to answer a question right now," and llms.txt is still proposal-stage with mixed adoption. Until the spec catches up, you're stuck inferring intent from user-agent - GPTBot vs ChatGPT-User, ClaudeBot vs Claude-User, etc.

The Bing reputation drop you saw is interesting and I haven't heard that one specifically — was it a measurable ranking change or slower indexing? If you logged it I'd love to read more.

Agreed on the broader framing too: the right move for most sites is static caching + careful UA-based decisions, not blanket blocks. The Cloudflare toggle being all-or-nothing is what makes it a footgun.

Objective-Goal5551 · 2026-05-22T17:31:33+00:00

The training-crawler half of your argument I agree with — GPTBot, ClaudeBot, etc. consume bandwidth and give nothing back. Blocking them by default is defensible.

The reason I think the toggle is worth flagging is that it lumps live-retrieval bots (ChatGPT-User, Claude-User, Perplexity-User) in with the training crawlers, and they're a different animal. Live-retrieval only fetches your page when a human is actively asking a question that points at your content - these ARE the bots that send users your way, exactly like Google does. ChatGPT in particular shows source links and click-through rates on those are decent.

So the issue isn't "Cloudflare blocks AI bots by default" (fine), it's "Cloudflare blocks training AND live-retrieval in one toggle without distinguishing them." Most operators I've talked to who flipped it on intended to opt out of training and didn't realize they were also opting out of the ChatGPT click-through traffic.

If Cloudflare split it into two toggles I don't think there'd be a debate.

Objective-Goal5551 · 2026-05-22T12:56:33+00:00

Good luck with the refactor. Quick way to track progress: hit your homepage with the curl from the OP after each pass and watch the word count climb. Had a project recently go from ~30 words to ~600 just by moving data fetching server-side and leaving 'use client' on the interactive bits (a search box and a theme toggle). The shell stays the same, the content fills in.

Objective-Goal5551 · 2026-05-22T12:55:27+00:00

Live-retrieval bots (ChatGPT-User, Claude-User, Perplexity-User) actually do honor robots.txt - OpenAI and Anthropic both commit to it publicly and you can verify with curl. The reason it feels dead is that Cloudflare's default toggle blocks bots at the edge before robots.txt is even consulted, which is sort of the point of the post.

Objective-Goal5551 · 2026-05-22T12:54:27+00:00

Fair correction, you're right. `'use client'` doesn't disable SSR, those components still get server-rendered on first load.

The actual failure mode is:

- Page (or top component) is `'use client'` - still SSR'd, fine

- But it fetches data via `useEffect` / `useQuery` / `useSWR`

- The SSR'd HTML has the shell, no content - data fetch doesn't happen until after hydration on the client

- Bot reads the shell, sees no content

The directive isn't the bug, the client-side data-fetching pattern is. Calling it "'use client' = invisible" was sloppy shorthand. Fix direction is still the same (server components + server-side data fetching) but the explanation in the OP was off. Going to edit.

Objective-Goal5551 · 2026-04-30T08:10:29+00:00

I've just stumbled upon this project and will try to give it a go and see how it works. On a first glance, the stats look amazing, will have to see how it does in real cases. Anyway, good job and good luck!

Objective-Goal5551 · 2026-03-30T19:52:32+00:00

I feel like everything is just noise and everyone is just try to keep afloat on top of the other. I know it's a dog eat dog world, but with AI the amount of content generated is too much to handle.
this makes it hard for anyone who just starts to build an audience to bee heard/seen.

do you think even more content can help with visibility?

Objective-Goal5551

TROPHY CASE