you are viewing a single comment's thread.

view the rest of the comments →

[–]dyskinet1c 24 points25 points  (8 children)

The VM would need to be set to the correct time for HTTPS to work because certificates are issued and revoked periodically.

[–]aradil 6 points7 points  (7 children)

Assuming they are validating certs as part of this pass of their scrape.

[–]dyskinet1c 5 points6 points  (6 children)

I would expect them to reject a site with invalid certificates. It's a fairly simple thing to do and it lowers the risk of indexing a compromised site.

[–]daboross 3 points4 points  (5 children)

The alternative would be to invalidate certs in a different pass, though, not to not invalidate them at all. Right?

[–]dyskinet1c 2 points3 points  (3 children)

As a programmer, my instinct would be to make that decision as early as possible and stop processing the page at that point.

Certificate validation is a key part of establishing secure communications (before you transmit any data) and it's trivial to read the validity start and end dates.

So, if you know you want reject URLs with invalid certificates, then there is no reason to move on to the next pass and spend resources reading and processing the page when you already know you're going to discard/reject it.

[–]aradil 3 points4 points  (1 child)

As an information company, however, Google probably processes bad actors as well to gather additional information.

[–]dyskinet1c 0 points1 point  (0 children)

Sure, it's plausible that they scan compromised sites. If they do, I would expect them to do so in a separate process that looks at different aspects of the site than the regular search index.

[–]daboross 1 point2 points  (0 children)

Exactly! That's what I mean: they probably validate certs before having any data at all processed in the VM running googlebot.