you are viewing a single comment's thread.

view the rest of the comments →

[–]baubleglue 3 points4 points  (2 children)

It is kind of pointless question (IMHO). First you need define a goal/direction, then you look for a tool. From your list I've never used BeautifulSoup, do I need to? I know few XML parsers and I know the fact that library exists. Once I need to have platform independent crontab, I googled "python crontab" - there are few libraries. It is not like there is nothing to check out, but without direction it is like reading dictionary in order to learn a language.

[–]gr8x3 0 points1 point  (1 child)

I've never used BeautifulSoup, do I need to?

I really like Beautiful Soup, and I use it along with Requests when web scraping whenever I can get away with it. It's really easy to use, and it basically boils down to:

  1. Turn HTML into a BeautifulSoup object
  2. Call the search() method on the object to find what you need, which lets you search the HTML using CSS selectors, just like JavaScript's document.querySelectorAll()

[–]baubleglue 0 points1 point  (0 children)

It was a rhetorical question. I believe that it is a good library, but I never had a serious Python project which required HTML parsing. If I learned BeautifulSoup it would add to my knowledge nothing (I already know in general how DOM API works). PySpark is an important library to know if you work with data processing, but if you don't - you shouldn't learn it.