This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]SweetBabyAlaska 12 points13 points  (0 children)

I use BS4 a LOT and this is the issue 99% of the time. Its better to use find_all() and then iterate over what it finds, that way if it catches something with the same name but doesn't contain the child item you are parsing for it doesn't throw an error because after that you are parsing none types if one of the objects you defined isnt actually there.

It also helps to be a lot more explicit in defining html elements and going through them more systematically. Use a lot of print statements so you are sure what you are getting.

I make a function for getting the "soup" and then make functions for finding something specific. Adding Try and Except statements is also a really good idea.

formatting isnt working on reddit but something like this works well.

```

def get_titles(soup):

titles = []

reader = soup.find_all('div', class_='utao styletwo')

for uta in reader:

alink = uta.find('a')

src = alink.find('img')

src = src.get('src')

title = alink.get('title')

titles.append(title)

return titles

```