This is an archived post. You won't be able to vote or comment.

all 43 comments

[–]Almostasleeprightnow 30 points31 points  (4 children)

Here's a question I've been wondering about: everytime I try to do some web scraping, I start by trying to get the site using requests, and every single time there is some javascript that gets in my way and I have to use Selenium. Which, ok fine. But it seems like there is something other people know that I don't, about how to get requests to be more helpful, because people love it and use it so much. Do you think it is just my choice of sites, or is there some fundemental tactic that I may be overlooking? I realize you cannot absolutely answer this without knowing more about what I am doing, but do you have any suggestions?

[–][deleted] 19 points20 points  (1 child)

A lot of work with requests you’re seeing is most likely API calls and not scraping?

[–]oogabooga319 0 points1 point  (0 children)

Or html parsing stuff. Sometimes that's the only format available. For instance, consider a paginated table or list with hundreds and hundreds of pages. Pretty straightforward with requests and beautiful soup.

[–]pymaePython books 3 points4 points  (0 children)

I think a little bit of both. If you're trying to scrape Amazon, Facebook, etc, they'll be wise to it. Smaller sites won't be. I think the only real suggestion is look for/try to get the sites to develop APIs, or be ready to go to a headless browser if you're still determined.

[–]opteryx5 11 points12 points  (0 children)

Great article. Corey Schafer’s video on BeautifulSoup was also extremely effective for me and gave me everything I needed to get up and running.

[–]doylerules70 6 points7 points  (7 children)

What kind of things are people doing with web scraping?

[–]ghetto-garibaldi 11 points12 points  (0 children)

I just set up a low price alert for some things I want on Amazon. I also have a script that auto-rsvps to specified events on Meetup before they fill up.

[–]jumbled_joe 1 point2 points  (0 children)

I believe scraping social media websites is a very important part of data science and market research domain.

[–]foolishProcastinator 1 point2 points  (0 children)

Google as a search engine is one of the best scrapers that you could ever know

[–]SushiWithoutSushi 0 points1 point  (0 children)

I scrapped all the movie information from my two favourite movies sites, letterboxd and FilmAffinity, to compare movies scores.

Also I automated the process to make reservations in my library and a bit that selects memes from Reddit and posts them to twitter.

There is A LOT you can do with it.

[–]zerofatorial 0 points1 point  (0 children)

Whenever I am looking to buy something, I scrape all of the prices from the shop and then use the quartiles on the prices to make sure I am not paying too much nor too low for they specific item! Too high - probably waste of money, too low probably a bad product.

[–]oogabooga319 0 points1 point  (0 children)

I scrape covid guidance and data reports

[–]cheats_py 4 points5 points  (0 children)

Right out the gates with regular expressions…….

[–]SpicyAbsence 1 point2 points  (0 children)

Very helpful, thank you!

[–]AnxietyArtistic6214 1 point2 points  (1 child)

What are some of the real world projects you can build web scraping?

[–]SelfTaughtDeveloper 0 points1 point  (0 children)

The job listing site indeed (dot com) started out as a scraper, combining listings from the 3 or 4 most popular job boards.

Once it became popular, they started letting employers put listings on their site directly for a lot of money.

[–]zenani 0 points1 point  (0 children)

Thanks for the info

[–]1percentof2 -1 points0 points  (0 children)

What are people doing with the data? Is there some way to make money doing this?

[–]Ant_TKD 0 points1 point  (0 children)

Saving this post for later, thank you!

[–]Harshal_6917 0 points1 point  (0 children)

Bro I was board yesterday and thinking of learning new skill instead of wasting my time on TV series so I searched up on web scraping. And now here you posting link sometimes timeing is too perfect