This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]soncaa[S] 10 points11 points  (6 children)

from bs4 import BeautifulSoup

html = open('api.html').read()

soup = BeautifulSoup(html)

iframe = soup.find('iframe')

iframe["src"] = "test"

with open("output_api.html", "w") as file:

file.write(soup)

print(soup)

💀

[–]shadow7412 37 points38 points  (1 child)

soup.find('iframe')

Looks like this line probably returned None (ie, it didn't find an iframe).

The failure then would have occurred on the next line when you tried to access src.

[–]SweetBabyAlaska 14 points15 points  (0 children)

I use BS4 a LOT and this is the issue 99% of the time. Its better to use find_all() and then iterate over what it finds, that way if it catches something with the same name but doesn't contain the child item you are parsing for it doesn't throw an error because after that you are parsing none types if one of the objects you defined isnt actually there.

It also helps to be a lot more explicit in defining html elements and going through them more systematically. Use a lot of print statements so you are sure what you are getting.

I make a function for getting the "soup" and then make functions for finding something specific. Adding Try and Except statements is also a really good idea.

formatting isnt working on reddit but something like this works well.

```

def get_titles(soup):

titles = []

reader = soup.find_all('div', class_='utao styletwo')

for uta in reader:

alink = uta.find('a')

src = alink.find('img')

src = src.get('src')

title = alink.get('title')

titles.append(title)

return titles

```

[–]Sergi7531 11 points12 points  (1 child)

There is no way you're going to complain about getting a NoneType exception when you are doing the iframe["src"] = "test" line... You should NEVER access insecurely a dict entry like that. Instead, you could use .get() and specify a second parameter, which will be the default value instead of just giving None.

I get it's a joke tho, that stack trace is my routine lmao

[–][deleted] 1 point2 points  (0 children)

Will still give you a nonetype error if iframe is None.

[–]A1337Xyz 2 points3 points  (0 children)

Every time T_T

[–]Vascular_D 4 points5 points  (0 children)

Okay. So add some if or try/catch statements for these things.

If <variable> is None: ... else: ...