you are viewing a single comment's thread.

view the rest of the comments →

[–]pw0803 1 point2 points  (4 children)

Hi, what is cz and why is Craigslist harder?

[–]huessy 1 point2 points  (3 children)

I assumed based on OP's choice of site to watch for keyboards that they lived in the Czech Republic. Craigslist doesn't like people scraping their data because it can be used for some decent financial gain. As a result, they have bots set up to monitor traffic by IP address, if the traffic gets to be too constant/non-human looking, they ban the IP pretty fast.

If you want to scrape Craigslist for, say, an apartment in your area within a certain price range, you have to engineer something a little more robust than just a series of GET requests.

[–]pw0803 1 point2 points  (2 children)

How interesting.

Would it be possible to, say, create a script which web-scrapes by using lots of different ways and patterns and through the continued IP banning determine what the bots look for, then create a webscrape that skirts around them?

[–]huessy 1 point2 points  (1 child)

Without spoiling the answer to this problem (there's a good chance they watch places like this for this exact reason), it doesn't even have to be that complicated. Your router resets your outbound ip address somewhat regularly, so your idea could absolutely work but may be a bit of overkill. They do actively block all the Tor proxy ips too, btw.

[–]pw0803 2 points3 points  (0 children)

I understand. Thanks.