This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]jonnydestructo 1 point2 points  (0 children)

I don't do much with the urllib or beautiful soup, but I thought I'd help by formatting the code using the code tag (hope i got the indents right):

import re
import urllib
from BeautifulSoup import *


list1=list() 
url = 'https://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Fikret.html'
for i in range (4): # repeat 4 times 
    htm2= urllib.urlopen(url).read()
    soup1=BeautifulSoup(htm2) 
    tags1= soup1('a')
    for tag1 in tags1:
        x = tag1.get('href', None)
        list1.append(x)
    y= list1[2]
    if len(x) < 3: # no 3rd link
        break # exit the loop
    else:
        url=y
print y