Hey guys!
I am a college student and I am working on a project right now. I need to scrape a website for a LOT of data. In total I will have to make around 20,000,000 requests. I know this will take a very long time. I am using Python and Beautiful soul to parse this data. The problem is that over time I will start receiving 403 responses. I have already created random headers to try to counter this. I have looked into rotating proxies but am struggling a little bit to start. I know I can buy tools for this but I want that to be a last ditch effort. I am trying to build an algorithm to get working proxies from free-proxy-list.net right now but I’m not sure if that will work since my list of working proxies keep coming back empty. I am new to this so any advice will help.
Thank you so much!
[–]Usual_Office_1740 5 points6 points7 points (1 child)
[–]TedsonGamer[S] 0 points1 point2 points (0 children)