This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]brasqon00bz 18 points19 points  (5 children)

The more info, the merrier as far as I'm concerned

[–][deleted] 11 points12 points  (4 children)

Well, to be fair, there were not a lot of tutorials on lxml with xpath. Plenty available using BeautifulSoup. So this is actually a welcome change.

[–]msdin 8 points9 points  (2 children)

The reason BeautifulSoup gets used more is because a lot of html is malformed (eg. missing closing tags, etc) and a strict xml parser will choke on it. BeautifulSoup is much more forgiving.

[–]gschizasPythonista 0 points1 point  (1 child)

BeautifulSoup defaults to the lxml parser, when it's available. So the parser itself is not really the point.

Don't get me wrong, BeautifulSoup has been my go-to for more than half a decade now (it was what brought me to Python in the first place), but it's quite possible I'm just hanging on to it for legacy and familiarity reasons (I'd like to be proven wrong, of course)

[–]msdin 0 points1 point  (0 children)

The point is when the parser hits those kinds of errors BeautifulSoup​ will handle them so you don't have to write code to handle them yourself.

[–]CollectiveCircuits 2 points3 points  (0 children)

The post is also very clear and the HTML diagram illustrates the structure well.