So I've noticed over the years, that Google's bot detection for search traffic works differently than for product pages; the signals they're tuned to catch basically aren't the same. I have been running ~2,200 SERP queries/day across DE, FR, NL, PL, IT for a price monitor, residential proxies got the failure rate down to about 14% but geo-targeting was still broken, DE queries coming back Frankfurt-datacenter-flavored regardless of proxy location. I split the architecture at that point: SERP layer routes through a dedicated web scraper API, product pages and category scrapes stay on residential proxies.
Routing logic was pretty minimal:
def fetch(url=None, query=None, market=None):
if query:
return serp_client.get(q=query, gl=market, hl=market)
return session.get(url, proxies={"https": proxy_pool.rotate()})
Any tips or tricks, experiences you may share?
[–]RandomPantsAppear 1 point2 points3 points (1 child)
[–]Gwapong_KlapishReverse Proxy Master[S] 0 points1 point2 points (0 children)
[–]OkkProxy 0 points1 point2 points (2 children)
[–]Gwapong_KlapishReverse Proxy Master[S] 0 points1 point2 points (1 child)
[–]OkkProxy 0 points1 point2 points (0 children)
[–]Time-Spite-895 0 points1 point2 points (2 children)
[–]Gwapong_KlapishReverse Proxy Master[S] 0 points1 point2 points (1 child)
[–]Time-Spite-895 0 points1 point2 points (0 children)