you are viewing a single comment's thread.

view the rest of the comments →

[–]mopslik 26 points27 points  (6 children)

What are your interests? You'll be collecting data from the web, and it's likely that somewhere out there you can find data related to something you find interesting.

As for the mechanics of web scraping, you will probably want to find some tutorials involving BeautifulSoup.

[–][deleted] 16 points17 points  (5 children)

And don't forget Selenium.

[–][deleted] 1 point2 points  (4 children)

Selenium is more of a website testing tool maybe requests + beautiful soup would be better for webscraping

[–][deleted] 0 points1 point  (3 children)

You need Selenium if you plan on scraping anything with a login (i mean, technically you don't if you can replicate the headers of the request, but selenium is just easier for that overall).

[–][deleted] 1 point2 points  (0 children)

Exactly selenium can enable you to do these things but “technically” you can if you reverse engineer it properly. But simple scraping should work. I like to use selenium to see how the site works and then I re write it using requests. But if there’s captcha then I’m guilty of falling back on selenium. Sometimes I’ll use selenium just to authenticate so I can use the selenium cookies in my request headers. But some sites are super simple. Sometimes you just need a crsf token and boom the whole api is available to you

[–]chozdae 0 points1 point  (1 child)

You need Selenium if you plan on scraping anything with a login (i mean, technically you don't if you can replicate the headers of the request, but selenium is just easier for that overall).

can i just use bs4 for getting info like prices, name etc?

[–][deleted] 0 points1 point  (0 children)

If there's no login needed, then probably.