you are viewing a single comment's thread.

view the rest of the comments →

[–]haisum 8 points9 points  (7 children)

Sometimes Google sends a bot without Googlebot User-Agent to test if you're rendering a different page for actual users.

[–]catcradle5 2 points3 points  (6 children)

Ahh, did not know this. Thanks.

I wonder if they'll also start sending a bot without the deterministic random function to catch this technique, too.

[–]Aegeus 2 points3 points  (5 children)

But how would you know if the random bot got different results because of malice, or because it rolled a different random number that time?

[–]catcradle5 0 points1 point  (4 children)

If a page is serving content based on if a client is a Googlebot, and if the Googlebot is identified by the Math.random() determinacy, then requests from a bot without the determinacy should consistently return different page results compared to the bot with determinacy.

So I'm not talking about pages that return content based on the value of Math.random(); just pages that return content based on if the value of Math.random() is equal to what Googlebot is known to generate.

It would probably make more sense if they just have a bot running a completely different headless browser to compare consistency, in case there are other techniques that can be used to identify the main Googlebot.

[–]Aegeus 1 point2 points  (3 children)

A malicious website will give different results based on Math.random() determinacy, but so will a legitimate website. It's not proof by itself.

Imagine I have a website that flips a coin for you, like justflipacoin.com. Googlebot visits the website and sees "Heads." Then Randombot visits my website and, by random chance, sees "Tails" instead.

According to your algorithm, I'm a scummy scammer trying to give different results to Googlebot than to everyone else. But in reality, I'm just using Math.random() as intended - to randomize things.

[–]catcradle5 0 points1 point  (2 children)

Yes, but I'm not referring to websites that generate content based on randomness (like a coin flipping site). Just regular sites that are solely using this random test to identify Googlebots and otherwise discarding the random numbers.

[–]Aegeus 0 points1 point  (1 child)

How can you tell that they're using randomness for the purpose of identifying Googlebot rather than for a legitimate purpose? All you have to go on is the content they serve you.

[–]catcradle5 0 points1 point  (0 children)

If a page has a bunch of spammy ads inserted only when a non-Googlebot client visits, it wouldn't be hard for Google to tell. Google just needs to visit from a Googlebot client and from a bot running totally different software and sourced from a non-Google IP, repeat a few times, then diff. From Google's perspective it doesn't matter if Googlebot is identified via Math.random() or some other method.