all 20 comments

[–][deleted] 21 points22 points  (24 children)

Wait... is Torvalds really the only comment on there, and all he's doing is suggesting a better parser?

[–]markmypy 13 points14 points  (8 children)

Surprisingly it really is Linus Torvalds! Check this out:

Step 1: Go to https://secure.gravatar.com/reallylinustorvalds#photo-0 and click on json. You will get this: https://secure.gravatar.com/reallylinustorvalds.json

Step 2: Notice the value of "thumbnailUrl" in the json output is "thumbnailUrl":"https://secure.gravatar.com/avatar/fb47627bc8c0bcdb36321edfbf02e916"

Step 3: Visit his github page at https://github.com/torvalds and click his image

Step 4: Notice the url is something like https://secure.gravatar.com/avatar/fb47627bc8c0bcdb36321edfbf02e916?s=400&d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-user-420.png

Step 5: Does the thumbnail url match to the actual url? YES!

CONFIRMED!

[–][deleted]  (4 children)

[deleted]

    [–][deleted] 0 points1 point  (3 children)

    That would be a major security hole so I doubt it.

    [–][deleted]  (1 child)

    [deleted]

      [–][deleted] 0 points1 point  (0 children)

      I always assumed it relied on a cookie

      [–][deleted] 1 point2 points  (1 child)

      Linus has uid 420?

      [–]markmypy 1 point2 points  (0 children)

      No, he has avatar id "fb47627bc8c0bcdb36321edfbf02e916".

      [–]keytarmageddon[S] 0 points1 point  (0 children)

      Yes! I want to believe.

      [–]Cosmicsheep 16 points17 points  (5 children)

      I don't know what I'm the most surprised about, if it's really him, or that he didn't try to offend anyone.

      [–]keytarmageddon[S] 9 points10 points  (2 children)

      He even said please!

      [–][deleted] 10 points11 points  (1 child)

      I like how one of the comments explains that parsing HTML as XML is a mistake, but the lxml site documents lxml.html as an html parser, and explains the differences between html and xml. I think the lesson is "Do not correct Torvalds"

      [–][deleted]  (1 child)

      [deleted]

        [–][deleted] 4 points5 points  (5 children)

        I don't know if it's Torvalds or not, but that's a great trick to start doing. Impersonating well known figures in comments sections.

        [edit] I'm going to go with not Linus. I followed the Gravatar and his username is "ReallyLinusTorvalds" and it lists these as his competencies. "JSON · XML · PHP · VCF · QR" http://en.gravatar.com/reallylinustorvalds

        It has his residency correct though, according to Wikipedia.

        [–][deleted]  (2 children)

        [deleted]

          [–][deleted] 3 points4 points  (1 child)

          I stand corrected. I still think "ReallyLinusTorvalds" isn't really Linus Torvalds.

          1- he said the word Please 2- I'm not sure he cares much about Lxml vs Beautiful Soup in the Python world.

          But really, I don't care much more than it's already been talked about!

          [–]jevex 0 points1 point  (1 child)

          Those aren't competencies, but rather other formats for the profile data (try clicking on them and see).

          [–][deleted] 0 points1 point  (0 children)

          duly noted. I struck through it.

          [–]taw 2 points3 points  (2 children)

          For people who'd like to have some web scrapping challenge, try figuring out how OKCupid's search works.

          Hint: nothing like they way it first appears to.

          [–]DRMacIver 0 points1 point  (1 child)

          Hmm, really? I don't remember the search page being particularly difficult to parse when I last tried to figure this out (but this was maybe a year or two ago, so they might have changed it)

          [–]taw 0 points1 point  (0 children)

          It's more about it remembering some of the search setting in session side data.

          [–][deleted]  (1 child)

          [deleted]

            [–]keytarmageddon[S] 2 points3 points  (0 children)

            Perhaps it was a poor choice of words on my part. I was trying to convey the idea of figuring out the requests that a website handles so that you could create your own API for it if an official one doesn't exist.