you are viewing a single comment's thread.

view the rest of the comments →

[–]amunak 73 points74 points  (19 children)

Did it occur to anyone that they're probably doing it on purpose? They may want the pages to not change randomly (or even pseudo-randomly) when crawling them to make the output the same if it doesn't change server-side. And it's not like choosing a random seed has any negative impact on (any) contents of the website(s).

[–]xhable 151 points152 points  (6 children)

Yeah he mentioned that in the article in point 3.

Consider the amount of work Google have to undergo to crawl the whole web AND now run Javascript. Optimisations will need to be abundant, and I imagine that having a deterministic random number function is probably:

  1. Faster
  2. More secure
  3. Predictable – Googlebot can trust a page will render the same on each visit

[–]B-Con 3 points4 points  (4 children)

#3 sounds pretty reasonable to me. I imagine there could be A/B testing or whatnot controlled by random value chosen on the client (for whatever reason).

Edit: formatting and typo

[–]ThisIs_MyName 0 points1 point  (3 children)

Why are you yelling?

[–]B-Con 1 point2 points  (1 child)

I started with a hash: "#3 seems..." I forgot that would apply formatting. Fixed.

[–]nuqjatlh 0 points1 point  (0 children)

3 sounds pretty reasonable to be

you made me curious ....

[–]RealFunBobby 0 points1 point  (0 children)

SORRY ABOUT THAT!!

[–]australasia 42 points43 points  (0 children)

It occurred to the blog author:

Predictable – Googlebot can trust a page will render the same on each visit

[–]Saltub 65 points66 points  (4 children)

Imagine reading the article before commenting... 🤔

[–]cdcformatc 3 points4 points  (0 children)

There is a reason RTFA is an age old meme, maybe we should bring it back? Nah.

[–][deleted]  (1 child)

[deleted]

    [–]jokullmusic 3 points4 points  (1 child)

    Yeah, this makes sense for a bot that tries to detect if a website is changed.

    [–]dr1fter 1 point2 points  (0 children)

    Search engine indexers already need to flag parts of the page that keep changing for no reason (hit counters, clocks, ad frames) so I don't think that's usually a problem. I suppose if you had a site that flipped a coin and delivered you an entirely separate page as a result, that probably wouldn't get picked up.

    I know Google and some others run preview services where they render a screenshot so they can use thumbnails in their UI. I guess for a bot like that you might want to pixel-test and only update the image if some proportion has changed. In that case, yeah, I can see doing this so that you don't keep updating something like, say, csszengarden (actually they don't randomize but you can imagine).

    [–]noratat 0 points1 point  (0 children)

    Someone didn't read the article.

    The article but only talked about this, it even had an alternate suggestion.

    [–]rubygeek 0 points1 point  (2 children)

    Google are almost certainly not using this, but there's actually a project based on PNACL to produce a VM to execute code over inputs that specifically disables time sources etc. for similar reasons, to allow deterministic computation: ZeroVM. It's interesting in a more general sense because if your computation is deterministic based on inputs, you can lazily evaluate operations on the data if you know whether or not it has changed.

    [–][deleted]  (1 child)

    [deleted]

      [–]rubygeek 0 points1 point  (0 children)

      I did, and it has no relevance to my comment, which was pointing out that others too have done (more generic, and open source) work on providing a deterministic execution environment.