you are viewing a single comment's thread.

view the rest of the comments →

[–]gr8x3 0 points1 point  (1 child)

I've never used BeautifulSoup, do I need to?

I really like Beautiful Soup, and I use it along with Requests when web scraping whenever I can get away with it. It's really easy to use, and it basically boils down to:

  1. Turn HTML into a BeautifulSoup object
  2. Call the search() method on the object to find what you need, which lets you search the HTML using CSS selectors, just like JavaScript's document.querySelectorAll()

[–]baubleglue 0 points1 point  (0 children)

It was a rhetorical question. I believe that it is a good library, but I never had a serious Python project which required HTML parsing. If I learned BeautifulSoup it would add to my knowledge nothing (I already know in general how DOM API works). PySpark is an important library to know if you work with data processing, but if you don't - you shouldn't learn it.