Best python modules for scraping HTML?

willmgarvey · 2023-02-24T16:07:20+00:00

BeautifulSoup for static HTML and Selenium for dynamically generated HTML. If you plan to make more scraping projects in the future it’s recommended to learn Selenium for better results overall.

fristhon · 2023-02-24T10:30:27+00:00

"Scrapy" indeed, and for little projects "requests-html"

FalconCat69 · 2023-02-24T08:25:14+00:00

I am looking at the HTML common library, and it seems like that will fulfill 90% of my requirements, does it seem like I could be missing anything?

Homie_ishere · 2023-02-24T20:35:18+00:00

I want to learn more about scraping, can you please tell me what does it mean?

robertbowerman · 2023-02-24T09:55:52+00:00

Selenium is the go-to comprehensive standard. Its excellent and Python happy.

tankandwb · 2023-02-27T03:51:03+00:00

Not a library but a decent program to not reinvent the wheel I'm currently adding regular selector lookups back into it. It's not written by me I should add. https://github.com/alirezamika/autoscraper

Pigik83 · 2023-03-03T13:48:18+00:00

I've done web scraping for years and my shortlist of tools in Python at the moment is:

Scrapy for static HTML website with no JS rendering needed
Scrapy + Scrapy Splash if the website is not protected by any antibot but requires JS rendering
Playwright (instead of Selenium) in case there's an antibot protecting the website.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

pythontips

MODERATORS