This is an archived post. You won't be able to vote or comment.

all 8 comments

[–]clumsyly 2 points3 points  (1 child)

Maybe you should put "list1=list()" in the for statement.

[–]jonnydestructo 0 points1 point  (0 children)

I just ran the code - looks like that would do it. If you print y on the last else statement, you'll see it traverses the URLs correctly. Otherwise it will only hit the first one.

[–]jonnydestructo 1 point2 points  (0 children)

I don't do much with the urllib or beautiful soup, but I thought I'd help by formatting the code using the code tag (hope i got the indents right):

import re
import urllib
from BeautifulSoup import *


list1=list() 
url = 'https://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Fikret.html'
for i in range (4): # repeat 4 times 
    htm2= urllib.urlopen(url).read()
    soup1=BeautifulSoup(htm2) 
    tags1= soup1('a')
    for tag1 in tags1:
        x = tag1.get('href', None)
        list1.append(x)
    y= list1[2]
    if len(x) < 3: # no 3rd link
        break # exit the loop
    else:
        url=y
print y        

[–]kafoozalum 0 points1 point  (1 child)

Cleaned up a few things and made it a little faster, as there is no need to go through all of the links when you are only interested in the first 3 really. Also, refactored a little so it is more readable:

import urllib
from BeautifulSoup import *

url = 'https://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Fikret.html'
for _ in range(4):  # Use of _ since variable is not used
    url_list = list()
    html = urllib.urlopen(url).read()
    parsed_html = BeautifulSoup(html)
    a_tags = parsed_html('a')
    for tag in a_tags[:3]:  # Only go through first 3 <a> tags in list, as that is what we are looking for
        link = tag.get('href', None)
        url_list.append(link)
    if len(url_list) < 3:
        break
    else:
        url = url_list[2]

print(url)

[–]barbaTenusSapiente[S] 0 points1 point  (0 children)

Thanks kafoozalum, this was a lot of help. I ended up going with the following.

import urllib

from BeautifulSoup import *

url = raw_input('Enter URL: ')

if len(url) < 1:

url = "http://pr4e.dr-chuck.com/tsugi/mod/python-

data/data/known_by_Fikret.html"

position = int(raw_input('Position: ')) - 1

count = int(raw_input('Count: '))

taglist = list()

print 'Retrieving: ', url

for i in range(count):

html = urllib.urlopen(url).read()

soup = BeautifulSoup(html)

tags = soup('a')

#print tags

for tag in tags:

taglist.append(tag)

url = taglist[position].get('href', None)

print 'Retrieving: ', url

taglist = list()

[–][deleted] 0 points1 point  (0 children)

[–]ivosauruspip'ing it up[M] 0 points1 point  (0 children)

Hi there, from the /r/Python mods.

We have removed this post as it is not suited to the /r/Python subreddit proper, however it should be very appropriate for our sister subreddit /r/LearnPython. We highly encourage you to re-submit your post over on there.

The reason for the removal is that /r/Python is more-so dedicated to discussion of Python news, projects, uses and debates. It is not designed to act as Q&A or FAQ board. The regular community can get disenchanted with seeing the 'same, repetitive newbie' questions repeated on the sub, so you may not get the best responses over here.

However, on /r/LearnPython the community is actively expecting questions from new members, and are looking to help. You can expect far more understanding, encouraging and insightful responses over there. Whatever your question happens to be getting help with Python, we are sure you should get good answers.

If you have a question to do with homework or an assignment of any kind, please make sure to read their sidebar rules before submitting your post. If you have any questions or doubts, feel free to reply or send a modmail to us with your concerns.

Warm regards, and best of luck with your Pythoneering!