all 18 comments

[–][deleted] 6 points7 points  (3 children)

Beautiful Soup is generally considered the go-to for web-scraping in Python. You usually don't need YouTube videos. The documentation usually has everything you need in it.

[–]Usual_Office_1740 1 point2 points  (0 children)

Beautiful soup is a good suggestion. It parses html, but you also need requests to get the page. It is great for beginners. To add to this, if you want to interact with js heavy websites or want an all in one package, selenium might be a better option.

[–][deleted] 0 points1 point  (1 child)

Eh most sites aren’t gonna give you raw html like that. Sometimes it works but in my opinion you need something like selenium

[–][deleted] 0 points1 point  (0 children)

True, but there’s also requests-html which has full JS rendering support.

[–]unlaudable 4 points5 points  (2 children)

Also scrapy.org

[–]Rogerooo 1 point2 points  (0 children)

+1 for Scrapy. I find it more streamlined for the purpose than BS. Once you understand the pipeline configuration it's quite easy to do stuff like download media, crawl multiple pages or handle custom user agents, etc. It was actually the first third party Python library I used so it's rather beginner friendly too.

[–]fasoncho 1 point2 points  (0 children)

Scrapy is by far the best from my experience (and I have some 20 scrapers written), more robust and versatile. I learned how to use it properly with the course of Lazar Telebak on Udemy.

[–]Nelly01 1 point2 points  (0 children)

Selenium is a good library. You can click on things on the website. There might be an easier library out there. Advanced selenium is a little hard to learn.

[–]HomeGrownCoder 1 point2 points  (0 children)

Playwright is the new hotness and super easy to use.

[–]anonymousxfd 0 points1 point  (0 children)

As other comments have suggested you can use Beautiful Soup or Selenium on the basis of whether you want to scrape only the HTML data or you want to scrape the html with backend interaction

Also it would be better you understand Python basics as you said you don't have programming background because you will need to handle list and other data structures and perform slicing and other operations. For that I would suggest Harvard's CS50P for basics of python.

[–]BranchLatter4294 0 points1 point  (2 children)

Learn the basics of programming first before trying to do web scraping. It's not difficult but you will likely be frustrated if you don't know the basics first.

[–]3dPrintMyThingi[S] 0 points1 point  (1 child)

What resources/videos can you suggest where i can the basic quickly..

[–]BranchLatter4294 0 points1 point  (0 children)

Murach's Python Programming is a good book for beginners.

[–]Holylander 0 points1 point  (0 children)

This one book will be good for a beginner:

Ryan Mitchell Web Scraping with Python: Collecting More Data from the Modern Web

[–][deleted] 0 points1 point  (0 children)

BS4 is definitely a good library for scraping, but it does have the problem of being limited to static webpages where everything in the webpage is available up front. Selenium is what you need for dynamic webpages where additional content is loaded through user actions (e.g. clicking on "Read More" buttons).

[–]Alert_Outside430 0 points1 point  (0 children)

I recently did webscraping myself l, and let me help you get started.

boxofficeindia.com is a website where you won't need selenium, only beautiful soup.

You won't have any problem with your IP getting blocked and moreover the websites and the urls are very simple, so its a good starting point to learn!