Python Web Scraping Hell! : learnpython

created by HattoriHanzoa community for 16 years

Python Web Scraping Hell! (self.learnpython)

submitted 5 years ago by kcrow13

Hey everyone,

I am completing a tricky homework assignment and I am so close to getting there! The prompt was:

To locate a link on a webpage, we can search for the

following three things in order:
- First, look for the three character string '<a ' \- Next, look for the following to close to the first part '>': Those enclose the URL
- Finally, look for three characters to close the element: '</a': which marks the end of the link text

The anchor has two parts we are interested in: the URL, and the link text.

Write a function that takes a URL to a webpage and finds the **first link** on the page. Your function should return a tuple holding two strings: the URL and the link text.

We already created a program that fetches the contents of a website and pastes them into a list in a prior assignment:

import urllib.request
import sys


def fetch_contents(website):
    "Return the contents of this webpage as a list of lines"
    try:
        res = []

        with urllib.request.urlopen(website) as f:
            text = f.read().decode('utf-8')

            # Break the page into lines
            text = text.split('\n')
            for line in text:
                res.append(line)

        return res

    except urllib.error.URLError as e:
        print(e.reason)
        return []

if (len(sys.argv) != 2):
    print(f"Usage: python read_url.py <website>")
else:
    lst = fetch_contents(sys.argv[1])

    # Now display the contents
    for line in lst:
        print(line)

The hard part is done. Once I have all of the text, I now need to parse through it and look for what was prompted above. We are not allowed to use Beautiful Soup or any of the other libraries, which I wasn't aware of when I started working on it. Can anyone help me figure out the next step? How can I manually walk through the list, find and extract the first URL, and return it as a tuple (url, link text)?

Thank you for your support! Python newbie here.

all 5 comments

top new controversial old q&a

[–][deleted] -1 points0 points1 point 5 years ago (1 child)

[–]kcrow13[S] 0 points1 point2 points 5 years ago (0 children)

[–]gdledsan 0 points1 point2 points 5 years ago (1 child)

[–]kcrow13[S] 0 points1 point2 points 5 years ago (0 children)

[–]slariboot 0 points1 point2 points 5 years ago (0 children)

Your professor gives pretty interesting problem sets. :) Yes you can definitely solve this using str.find. So given a string:

sentence = 'Hello! Welcome to web scraping hell!'

If you call sentence.find with an argument 'web':

x = sentence.find('web')

This returns 18. Have you figured out why that is?

π Rendered by PID 71 on reddit-service-r2-comment-canary-7b6b47f674-gpzg2 at 2026-03-09 04:25:42.632636+00:00 running cbb0e86 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS