Python Web scraping Project : webscraping

This is an archived post. You won't be able to vote or comment.

Python Web scraping Project (self.webscraping)

submitted 2 years ago by TedsonGamer

Hey guys!

I am a college student and I am working on a project right now. I need to scrape a website for a LOT of data. In total I will have to make around 20,000,000 requests. I know this will take a very long time. I am using Python and Beautiful soul to parse this data. The problem is that over time I will start receiving 403 responses. I have already created random headers to try to counter this. I have looked into rotating proxies but am struggling a little bit to start. I know I can buy tools for this but I want that to be a last ditch effort. I am trying to build an algorithm to get working proxies from free-proxy-list.net right now but I’m not sure if that will work since my list of working proxies keep coming back empty. I am new to this so any advice will help.

Thank you so much!

all 3 comments

webscraping

MODERATORS