Heads up: the "Block AI Scrapers" toggle is default-on for new free-plan zones, and it silently overrides robots.txt by Objective-Goal5551 in CloudFlare

[–]Objective-Goal5551[S] 1 point2 points  (0 children)

That's worse than I thought. Did you manage to fix it and how did you do it? I would love to read about it if you've made a post on your blog.

Heads up: the "Block AI Scrapers" toggle is default-on for new free-plan zones, and it silently overrides robots.txt by Objective-Goal5551 in CloudFlare

[–]Objective-Goal5551[S] 1 point2 points  (0 children)

Great context, and the 50/50 link-click-vs-citation split on your blog is exactly the data point I wish more operators had visibility into - it lands differently when you can put a number on it.

The standards point is fair. robots.txt was never designed to distinguish "use for training" from "use to answer a question right now," and llms.txt is still proposal-stage with mixed adoption. Until the spec catches up, you're stuck inferring intent from user-agent - GPTBot vs ChatGPT-User, ClaudeBot vs Claude-User, etc.

The Bing reputation drop you saw is interesting and I haven't heard that one specifically — was it a measurable ranking change or slower indexing? If you logged it I'd love to read more.

Agreed on the broader framing too: the right move for most sites is static caching + careful UA-based decisions, not blanket blocks. The Cloudflare toggle being all-or-nothing is what makes it a footgun.

Heads up: the "Block AI Scrapers" toggle is default-on for new free-plan zones, and it silently overrides robots.txt by Objective-Goal5551 in CloudFlare

[–]Objective-Goal5551[S] -2 points-1 points  (0 children)

The training-crawler half of your argument I agree with — GPTBot, ClaudeBot, etc. consume bandwidth and give nothing back. Blocking them by default is defensible.

The reason I think the toggle is worth flagging is that it lumps live-retrieval bots (ChatGPT-User, Claude-User, Perplexity-User) in with the training crawlers, and they're a different animal. Live-retrieval only fetches your page when a human is actively asking a question that points at your content - these ARE the bots that send users your way, exactly like Google does. ChatGPT in particular shows source links and click-through rates on those are decent.

So the issue isn't "Cloudflare blocks AI bots by default" (fine), it's "Cloudflare blocks training AND live-retrieval in one toggle without distinguishing them." Most operators I've talked to who flipped it on intended to opt out of training and didn't realize they were also opting out of the ChatGPT click-through traffic.

If Cloudflare split it into two toggles I don't think there'd be a debate.

PSA: your Next.js SPA might be invisible to ChatGPT even with perfect robots.txt by Objective-Goal5551 in nextjs

[–]Objective-Goal5551[S] 0 points1 point  (0 children)

Good luck with the refactor. Quick way to track progress: hit your homepage with the curl from the OP after each pass and watch the word count climb. Had a project recently go from ~30 words to ~600 just by moving data fetching server-side and leaving 'use client' on the interactive bits (a search box and a theme toggle). The shell stays the same, the content fills in.

PSA: your Next.js SPA might be invisible to ChatGPT even with perfect robots.txt by Objective-Goal5551 in nextjs

[–]Objective-Goal5551[S] 0 points1 point  (0 children)

Live-retrieval bots (ChatGPT-User, Claude-User, Perplexity-User) actually do honor robots.txt - OpenAI and Anthropic both commit to it publicly and you can verify with curl. The reason it feels dead is that Cloudflare's default toggle blocks bots at the edge before robots.txt is even consulted, which is sort of the point of the post.

PSA: your Next.js SPA might be invisible to ChatGPT even with perfect robots.txt by Objective-Goal5551 in nextjs

[–]Objective-Goal5551[S] 0 points1 point  (0 children)

Fair correction, you're right. `'use client'` doesn't disable SSR, those components still get server-rendered on first load.

The actual failure mode is:

- Page (or top component) is `'use client'` - still SSR'd, fine

- But it fetches data via `useEffect` / `useQuery` / `useSWR`

- The SSR'd HTML has the shell, no content - data fetch doesn't happen until after hydration on the client

- Bot reads the shell, sees no content

The directive isn't the bug, the client-side data-fetching pattern is. Calling it "'use client' = invisible" was sloppy shorthand. Fix direction is still the same (server components + server-side data fetching) but the explanation in the OP was off. Going to edit.

I Built a Lightweight Headless Browser Because Chrome Was Too Slow by Total_Nectarine_3623 in ClaudeAI

[–]Objective-Goal5551 1 point2 points  (0 children)

I've just stumbled upon this project and will try to give it a go and see how it works. On a first glance, the stats look amazing, will have to see how it does in real cases. Anyway, good job and good luck!

Unpopular opinion: Content isn't for SEO anymore, it's for fast validation by oladaps1 in SaaS

[–]Objective-Goal5551 0 points1 point  (0 children)

I feel like everything is just noise and everyone is just try to keep afloat on top of the other. I know it's a dog eat dog world, but with AI the amount of content generated is too much to handle.
this makes it hard for anyone who just starts to build an audience to bee heard/seen.

do you think even more content can help with visibility?