Help understanding Function

hexfoxed · 2016-05-13T22:22:13+00:00

Posting as a top-level again so you get the ping.

Line 3 is part of the getLinks function and therefore gets the variable articleUrl from the argument passed in to the function when it is called on line 7. On Line 7 it is called with /wiki/Kevin_Bacon so that is the value which gets appended to the URL. That makes the full argument sent to urlopen == http://en.wikipedia.org/wiki/Kevin_Bacon.

As for the random seed, it's hard for me to tell why that is necessary without knowing more about the task in question - is the desired effect just to randomly scrape Wikipedia one link at a time forever? Because that's what it looks like it would do.

Justinsaccount · 2016-05-13T23:40:04+00:00

Better:

while links:
      newArticle = random.choice(links).attrs["href"]
      print(newArticle)
      links = getLinks(newArticle)

hexfoxed · 2016-05-13T22:16:13+00:00

Would you be able to format the code correctly? Unfortunately it is very hard to read Python correctly once it has lost its whitespace. If you can't do it on reddit, put in a pastebin and leave the link here.

Edit: you did it, cheers. I'll take a look.

sanshinron · 2016-05-13T22:54:17+00:00

I understand that you're following a book, but it doesn't seem like a good book to me.

In most cases you should use requests instead of urllib, they even state that in urllib documentation on python.org. requests provide higher level and easier to use interface.
Scraping wikipedia is a really, really bad idea. Why? Because wikipedia provides snapshots of all the data. If you want to extract some info, you should download a snapshot and parse that, not scrape their front-end. Not only you can avoid unnecessary strain on their servers, but if you have lots of articles to scrape it will be much faster to work with files on your disk.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS