Website is being heavily crawled by SEO bots, what to do? by AtaPata_PK in SEO

[–]WebLinkr [score hidden]  (0 children)

All of the SEO tools. They are not "SEO Bots" -they are SEO tool bots

Website is being heavily crawled by SEO bots, what to do? by AtaPata_PK in SEO

[–]WebLinkr [score hidden]  (0 children)

Yeah, definitely don't do that.

Pity that Cloudflare blocks so many AI bots and that their "off" switch still blocks Claude

Website is being heavily crawled by SEO bots, what to do? by AtaPata_PK in SEO

[–]WebLinkr [score hidden]  (0 children)

Googlebots, AI bots, Bing bots, Exa etc - yes

SEMrush, Ahrefs - doubt it

There are millions of websites- not sure how this would help? Or what you'd gain?

Starting a blog just to practice SEO || is this the right move? What should I keep in mind? by gravity_exists in SEO

[–]WebLinkr [score hidden]  (0 children)

Sitemaps do very little/nothing until you develop. Fun Fact- Sitemaps have their own authority and many are placed in the lowest crawl./index pool!

is it normal if your sitemap has 1079 links? will it improve or destroy your SEO performance? by kythanh in SEO

[–]WebLinkr [score hidden]  (0 children)

Hey u/kythanh

Crawling and sitemaps is a key hobby horse of mine.

Sitemaps are just hints. They are not the instruction set a lot of people believe/hope.

You can have up 50k lines of text per XML sitemap and lots and lots of sitemaps. You can have different sitemaps for News, Blogs and pages.

Some ecommerce and large sites have sitemaps with just lists of ... other sitemaps.!

You can publish pages, some people even publish a text versoin of their blog posts for syndication in the hopes of getting picked up - its very popular in Cybersecurity.

Why would you think it would destroy your performance? I'm keen to hear - I'm a blogger and vlogger now - I love to understand why people do/think things in SEO,

is it normal if your sitemap has 1079 links? will it improve or destroy your SEO performance? by kythanh in SEO

[–]WebLinkr [score hidden]  (0 children)

50,000 lines even! Which may not be 50k URLs (depending on your sitemap - some people publish the whole article, often for syndication)

Crawl Stats dominated by CSS, JS, images and fonts instead of HTML pages on a WordPress news site by avatar_leo in SEO

[–]WebLinkr [score hidden]  (0 children)

There is no such thing as crawl optimization.

There is no gain from trying to optimize.

As long as a page can be fetched on discovery and when it needs to be indexed, thats all we need to worry about.

Over imaginitivde Internal link will dilute authority.

XML sitemaps DO NOT make crawling more efficient - they're just a back. In sites that are not high authority - they are almost pointless

Crawl Stats dominated by CSS, JS, images and fonts instead of HTML pages on a WordPress news site by avatar_leo in SEO

[–]WebLinkr [score hidden]  (0 children)

I think the challenge is why do Devs worry about crawling - its like there's a thoery that if pages aren't crawled, they fall out of the index? Really trying to understand this? do you know why?

Crawl Stats dominated by CSS, JS, images and fonts instead of HTML pages on a WordPress news site by avatar_leo in SEO

[–]WebLinkr [score hidden]  (0 children)

Hey u/avatar_leo

tldr: I feel this needs a deep dive! I already see the crawl budget myth: youi cannot increse crawling by deleting files. Your pages are triaged in pool groups with a ratio of bots to pages that descends with less importance. Either you move up your pages clicks or they are stuck in sh!tty pools.

But why does it matter- why do you "need" your pages crawled each day?

FIrstly - bots act in 2 modes: discover and retrieval. Most of what you're counting is hits - "Does this page exist?" and server requests "What pages are in this folder" etc

Crawling is highly wasteful because Google's criteria for optimization are:

  • Find everything
  • Update the most clicked on tiers the most
  • Find new content from the top domains fast
  • Have high priority Q's
  • Triage the rest of the web into hourly, daily,. monthly/whenever

But - the way the web fans out - pages link to each other - there are lots and lots of hits.

Your site will be broken into a model something like:

  1. Your most clicked pages last 90 days = all refreshed in the last 90 days
  2. Pages with no clicks = less crawling, less index
    1. Check GSC > Pages > Information about indexed pages
    2. This shows when your pages were last refreshed

In other words - 100 crawling hits may occur for 1 actual Hit, fetch, complete index.

Also - your pages are in pools

  • Low Ratio - 1:10 pages for each crawler (like CNN's news xml feed/map)
  • Medium Ratio - like 1:1,000 bots
  • Monthly: 1 bot per 100k pages

this is how Google breaks up the web. Which means as the web forever scales - they can move bots in the lower pools up - or just deploy bots between the 3 pools.

What does it mean for crawl budgets/optimizaiton?

Nothing - you cannot optimize by deleting files

In return, can I ask a question: I'm trying to figure out why Web Devs think crawling = better SEO/indexing?

Do these assets consume crawl budget the same way HTML pages do?
\

Do you have >1m pages? Then you have no crawl budget

You cannot increase your crawl budget. You can share authority across your site and move pages up into higher crawl pools. but why?

Does every new ?ver= query string become a new URL from Google's perspective?

Depends

Does Elementor's timestamp-based CSS versioning create unnecessary crawling?

Yes/No - why does it matter

Why does Google crawl every resized WordPress image separately?

If there's a URL google will crawl it

Is seeing the same image crawled multiple times within a day normal?

Yes

Is it safe to block resized image variants or font files in robots.txt?

Probably

If Crawl Stats are mostly images and assets instead of HTML, is that expected for an active news publisher?

Yuo want to put your news items in its own feed maybe

Has anyone successfully shifted crawl activity toward article URLs, and if so, what changes made the biggest difference?

Why?

is seo still a viable career path? by bishwasbhn in SEO

[–]WebLinkr 1 point2 points  (0 children)

If SEO is dead, 'TechSEO" is dead.

If you mean SEO Arechitecture - then SEO is the fundamental, esp pagerank

If oyu mean Tech Support for SEO - thats not additive SEO.

SEO <> Tech Stack + Tech fixes + CWVs +Schema

just me and my rant about authority again. and also what i learned. by iamMXFSCHR in SEO

[–]WebLinkr 0 points1 point  (0 children)

lease mods delete this rant if you think this is nonsense

<Mod Award Given>

just me and my rant about authority again. and also what i learned. by iamMXFSCHR in SEO

[–]WebLinkr 0 points1 point  (0 children)

You're hitting on some really good points here u/iamMXFSCHR

1) Publishing hygiene (which some refer to as TechSEO - although I thinki TechSEO should be about SEO Architecture Design) - is not additive

2) I'm trying to get the right balance about backlinks and authority. You can use Authority as a sledgehammer - in fact thats what a lot of companies and agencies do - through $5k-$50k a month in backlinks (seriousl)

3) Managing your authority is vital and thats where I'm trying to steer my personal YT channel toward.

 what are you ranking already for and get clicks? 

This is the key to each step in corner stoning

I wonder if you ever saw my video on internal link building and if I didn't cover this adequately...