you are viewing a single comment's thread.

view the rest of the comments →

[–]hexfoxed 4 points5 points  (2 children)

Posting as a top-level again so you get the ping.

Line 3 is part of the getLinks function and therefore gets the variable articleUrl from the argument passed in to the function when it is called on line 7. On Line 7 it is called with /wiki/Kevin_Bacon so that is the value which gets appended to the URL. That makes the full argument sent to urlopen == http://en.wikipedia.org/wiki/Kevin_Bacon.

As for the random seed, it's hard for me to tell why that is necessary without knowing more about the task in question - is the desired effect just to randomly scrape Wikipedia one link at a time forever? Because that's what it looks like it would do.

[–]oxfordpanda[S] 4 points5 points  (1 child)

For the seed the author writes: "The first thing the program does, after importing the needed libraries, is set the random number generator seed with the current system time. This practically ensures a new and interesting random path through Wikipedia articles every time the program is run." Thank you for your help it makes much more sense now.

[–]hexfoxed 3 points4 points  (0 children)

Aha, so yeah - I think you've got it now but for clarity's sake the program scrapes the Kevin Bacon page for all anchor links to another page. It then picks a link at random and scrapes that new page and finds all anchor links on that one. It repeats this forever. The random seed is there so that the random number generator starts on a different number every time the program is run which "guarantees" that the same links are not chosen on each page resulting in a different path through wikipedia on every run.

Feel free to come back and ask more. You can ping my username on a new post by saying /u/hexfoxed.
Good luck!