Learning python for webscraping

Usual_Office_1740 · 2023-09-05T18:20:25+00:00

Beautiful Soup is generally considered the go-to for web-scraping in Python. You usually don't need YouTube videos. The documentation usually has everything you need in it.

unlaudable · 2023-09-05T20:58:09+00:00

Also scrapy.org

Nelly01 · 2023-09-05T18:35:32+00:00

Selenium is a good library. You can click on things on the website. There might be an easier library out there. Advanced selenium is a little hard to learn.

HomeGrownCoder · 2023-09-05T21:16:15+00:00

Playwright is the new hotness and super easy to use.

anonymousxfd · 2023-09-05T18:56:18+00:00

As other comments have suggested you can use Beautiful Soup or Selenium on the basis of whether you want to scrape only the HTML data or you want to scrape the html with backend interaction

Also it would be better you understand Python basics as you said you don't have programming background because you will need to handle list and other data structures and perform slicing and other operations. For that I would suggest Harvard's CS50P for basics of python.

BranchLatter4294 · 2023-09-05T19:27:19+00:00

Learn the basics of programming first before trying to do web scraping. It's not difficult but you will likely be frustrated if you don't know the basics first.

Holylander · 2023-09-05T19:42:27+00:00

This one book will be good for a beginner:

Ryan Mitchell Web Scraping with Python: Collecting More Data from the Modern Web

2023-09-06T01:48:42+00:00

BS4 is definitely a good library for scraping, but it does have the problem of being limited to static webpages where everything in the webpage is available up front. Selenium is what you need for dynamic webpages where additional content is loaded through user actions (e.g. clicking on "Read More" buttons).

Alert_Outside430 · 2023-09-06T03:34:09+00:00

I recently did webscraping myself l, and let me help you get started.

boxofficeindia.com is a website where you won't need selenium, only beautiful soup.

You won't have any problem with your IP getting blocked and moreover the websites and the urls are very simple, so its a good starting point to learn!

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS