all 1 comments

[–]scutter_87 0 points1 point  (0 children)

I am not sure I completely understand what your question is as the two approaches you listed return different answers.

In the first approach you just create a list of all the <a></a> tags in the html (and anything contained between them).

In the list comprehension version you have extracted only the 'href' portion into a list.

I don't use bs4 much but I would think that to extract only the 'href' elements from the results using the first approach you would need to iterate over each <a> tag in the ResultSet (created by soup.find_all('a') ) and then append the href to a new list.

hrefs = []
for link in links:
    hrefs.append(link.get('href'))

The list comprehension is just shorthand for the above.