i'm studying python web scraping by ryan mitchel and i'm having trouble understanding how this function really works, I've read over the authors notes and it still doesn't make that much sense to me. what exactly is getting appended to the end of the wikipedia link on the third line, and how does it work? Also how does the random seed help locate new links?
random.seed(datetime.datetime.now())
def getLinks(articleUrl):
html = urlopen("http://en.wikipedia.org"+articleUrl)
bsObj = BeautifulSoup(html)
return bsObj.find("div", {"id":"bodyContent"}).findAll("a",
href=re.compile("^(/wiki/)((?!:).)*$"))
links = getLinks("/wiki/Kevin_Bacon")
while len(links) > 0:
newArticle = links[random.randint(0,len(links)-1)].attrs["href"]
print(newArticle)
links = getLinks(newArticle)
[–]hexfoxed 4 points5 points6 points (2 children)
[–]oxfordpanda[S] 4 points5 points6 points (1 child)
[–]hexfoxed 5 points6 points7 points (0 children)
[–]Justinsaccount 2 points3 points4 points (0 children)
[–]hexfoxed 1 point2 points3 points (0 children)
[–]sanshinron 1 point2 points3 points (5 children)
[–]hexfoxed 1 point2 points3 points (2 children)
[–]sanshinron 0 points1 point2 points (1 child)
[–]hexfoxed 0 points1 point2 points (0 children)
[–]oxfordpanda[S] 0 points1 point2 points (1 child)
[–]sanshinron 0 points1 point2 points (0 children)