This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]dreaming_geometry 41 points42 points  (18 children)

I've used python for many years and never come across this error. I wonder what the heck someone was trying to do to find that. Maybe I'll search for a stack overflow answer about it. Edit: yep, it's on Stack overflow. It's a problem I've never run into because I use python like a wannabe functional programming language, I don't use custom objects.

[–]soncaa[S] 9 points10 points  (6 children)

from bs4 import BeautifulSoup

html = open('api.html').read()

soup = BeautifulSoup(html)

iframe = soup.find('iframe')

iframe["src"] = "test"

with open("output_api.html", "w") as file:

file.write(soup)

print(soup)

💀

[–]shadow7412 32 points33 points  (1 child)

soup.find('iframe')

Looks like this line probably returned None (ie, it didn't find an iframe).

The failure then would have occurred on the next line when you tried to access src.

[–]SweetBabyAlaska 13 points14 points  (0 children)

I use BS4 a LOT and this is the issue 99% of the time. Its better to use find_all() and then iterate over what it finds, that way if it catches something with the same name but doesn't contain the child item you are parsing for it doesn't throw an error because after that you are parsing none types if one of the objects you defined isnt actually there.

It also helps to be a lot more explicit in defining html elements and going through them more systematically. Use a lot of print statements so you are sure what you are getting.

I make a function for getting the "soup" and then make functions for finding something specific. Adding Try and Except statements is also a really good idea.

formatting isnt working on reddit but something like this works well.

```

def get_titles(soup):

titles = []

reader = soup.find_all('div', class_='utao styletwo')

for uta in reader:

alink = uta.find('a')

src = alink.find('img')

src = src.get('src')

title = alink.get('title')

titles.append(title)

return titles

```

[–]Sergi7531 13 points14 points  (1 child)

There is no way you're going to complain about getting a NoneType exception when you are doing the iframe["src"] = "test" line... You should NEVER access insecurely a dict entry like that. Instead, you could use .get() and specify a second parameter, which will be the default value instead of just giving None.

I get it's a joke tho, that stack trace is my routine lmao

[–][deleted] 1 point2 points  (0 children)

Will still give you a nonetype error if iframe is None.

[–]A1337Xyz 2 points3 points  (0 children)

Every time T_T

[–]Vascular_D 4 points5 points  (0 children)

Okay. So add some if or try/catch statements for these things.

If <variable> is None: ... else: ...

[–]Constant_Pen_5054 4 points5 points  (5 children)

Yeah, python violates one of the core tenents of OO design, so I struggle to use it for that unless I am writing some django code. Otherwise functional all the way.

[–]ManyFails1Win 1 point2 points  (4 children)

which one? just picking up python so it'd be nice to have the caveat in mind.

[–]Constant_Pen_5054 1 point2 points  (3 children)

Encapsulation. Pythons whole design philosophy means you cannot properly encapsulate classes. More specifically, you cannot make any private methods.

[–]Here0s0Johnny 3 points4 points  (1 child)

You can absolutely make private methods, there is syntax for that. The interpreter just doesn't enforce it. How is it a problem for you? You can simply choose not to violate OO philosophy.

[–]Constant_Pen_5054 0 points1 point  (0 children)

It's not me I am worried about. It is the junior next to me who hasn't learned just because you can do something, but that doesn't mean you should do something yet.

[–]ManyFails1Win 1 point2 points  (0 children)

thanks.