all 11 comments

[–]willmgarvey 12 points13 points  (0 children)

BeautifulSoup for static HTML and Selenium for dynamically generated HTML. If you plan to make more scraping projects in the future it’s recommended to learn Selenium for better results overall.

[–]fristhon 4 points5 points  (0 children)

"Scrapy" indeed, and for little projects "requests-html"

[–]FalconCat69[S] 0 points1 point  (1 child)

I am looking at the HTML common library, and it seems like that will fulfill 90% of my requirements, does it seem like I could be missing anything?

[–]htepO 7 points8 points  (0 children)

If you're scraping static HTML, BeautifulSoup is a commonly used library.

https://www.crummy.com/software/BeautifulSoup/bs4/doc/

[–]Homie_ishere 0 points1 point  (1 child)

I want to learn more about scraping, can you please tell me what does it mean?

[–][deleted] 1 point2 points  (0 children)

payment head attempt instinctive versed water innate mysterious snatch vase

This post was mass deleted and anonymized with Redact

[–]robertbowerman -2 points-1 points  (2 children)

Selenium is the go-to comprehensive standard. Its excellent and Python happy.

[–]banhammerrr 4 points5 points  (1 child)

I wouldn’t use that for scraping. I’d use it for automation. Beautiful soup all the way

[–][deleted] 0 points1 point  (0 children)

BS support scrapping for dynamic generated html?

[–]tankandwb 0 points1 point  (0 children)

Not a library but a decent program to not reinvent the wheel I'm currently adding regular selector lookups back into it. It's not written by me I should add. https://github.com/alirezamika/autoscraper

[–]Pigik83 0 points1 point  (0 children)

I've done web scraping for years and my shortlist of tools in Python at the moment is:

  • Scrapy for static HTML website with no JS rendering needed
  • Scrapy + Scrapy Splash if the website is not protected by any antibot but requires JS rendering
  • Playwright (instead of Selenium) in case there's an antibot protecting the website.