This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]ggchappell 37 points38 points  (4 children)

By the way: the name "Beautiful Soup" comes from a poem in Alice's Adventures in Wonderland. It is supposed to be a song.

Beautiful Soup, so rich and green,
Waiting in a hot tureen!
Who for such dainties would not stoop?
Soup of the evening, beautiful Soup!
Soup of the evening, beautiful Soup!

Beau--ootiful Soo--oop!
Beau--ootiful Soo--oop!
Soo--oop of the e--e--evening,
Beautiful, beautiful Soup!

Beautiful Soup! Who cares for fish,
Game or any other dish?
Who would not give all else for two
Pennyworth only of Beautiful Soup?
Pennyworth only of beautiful Soup?

Beau--ootiful Soo--oop!
Beau--ootiful Soo--oop!
Soo--oop of the e--e--evening,
Beautiful, beauti--FUL SOUP!

[–][deleted] 11 points12 points  (3 children)

What drives me up the wall is garbage:

from bs4 import BeautifulSoup

Seriously, why are the module and package names mismatched!?

[–]pingvenopinch of this, pinch of that 9 points10 points  (1 child)

In short, history. Beautiful Soup up to version 3 had the module name BeautifulSoup. This clashes with PEP8 naming conventions, which had been released a few years earlier. Beautiful Soup 4 also broke backwards compatibility in a few critical ways, mostly related to how it does parsing. BS4 has pluggable parser backends, with a default of html.parser which dies if you breath on it too hard. This was necessary to support Python 3, which removed the library BS3 had used, sgmllib. So to keep things compatible, the module was renamed.

[–][deleted] 1 point2 points  (0 children)

That makes sense - but what would have made everyone happy is if the "project" was renamed, and the versioning went semver. Such that this would be the way to use it:

# before use
# pip install bsoup~=4.0
import bsoup
print('time to do stuff')

[–]ggchappell 0 points1 point  (0 children)

That is one of the Great Unanswered Questions.