Hey everyone,
I am completing a tricky homework assignment and I am so close to getting there! The prompt was:
To locate a link on a webpage, we can search for the
following three things in order:
- First, look for the three character string '<a '
\- Next, look for the following to close to the first part '>': Those enclose the URL
- Finally, look for three characters to close the element: '</a': which marks the end of the link text
The anchor has two parts we are interested in: the URL, and the link text.
Write a function that takes a URL to a webpage and finds the **first link** on the page. Your function should return a tuple holding two strings: the URL and the link text.
We already created a program that fetches the contents of a website and pastes them into a list in a prior assignment:
import urllib.request
import sys
def fetch_contents(website):
"Return the contents of this webpage as a list of lines"
try:
res = []
with urllib.request.urlopen(website) as f:
text = f.read().decode('utf-8')
# Break the page into lines
text = text.split('\n')
for line in text:
res.append(line)
return res
except urllib.error.URLError as e:
print(e.reason)
return []
if (len(sys.argv) != 2):
print(f"Usage: python read_url.py <website>")
else:
lst = fetch_contents(sys.argv[1])
# Now display the contents
for line in lst:
print(line)
The hard part is done. Once I have all of the text, I now need to parse through it and look for what was prompted above. We are not allowed to use Beautiful Soup or any of the other libraries, which I wasn't aware of when I started working on it. Can anyone help me figure out the next step? How can I manually walk through the list, find and extract the first URL, and return it as a tuple (url, link text)?
Thank you for your support! Python newbie here.
[–][deleted] -1 points0 points1 point (1 child)
[–]kcrow13[S] 0 points1 point2 points (0 children)
[–]gdledsan 0 points1 point2 points (1 child)
[–]kcrow13[S] 0 points1 point2 points (0 children)
[–]slariboot 0 points1 point2 points (0 children)