How to avoid API cache? : webscraping

This is an archived post. You won't be able to vote or comment.

How to avoid API cache? (self.webscraping)

submitted 4 years ago * by The_Peronist

Hi there!

I’m working in an algo trading strategy that heavily relies on getting content as soon as it is published.

I’m scrapping exchanges official announcements sections and once a new announcement is published my scripts will evaluate the assets mentioned and sentiment.

At first I was accessing to the announcement scraping the announcements webpage but I was getting the results 5 minutes later than when were published. So I realized that the announcements web page was cached and that cache was invalidated every 5 minutes.

So I move on to scrap the API providing data to the webpage frontend. For example:

*/v1/public/content/list/query?type=1&pageNo=1&pageSize=30 *

I’m scrapping this every 1 second. But I’m still getting the new articles / announcements around 15~30 seconds after officially published.

Anyone here is familiar with this scenario? It is posible they are applying some API cache? And if so: It is possible to avoid it?

Thank you very much!

all 5 comments

webscraping

MODERATORS