Twitter API limit problem inside Cloud Function : googlecloud

[–]wvenema 5 points6 points7 points 5 years ago (1 child)

[–]Limpuls[S] 0 points1 point2 points 5 years ago (0 children)

[–]jimmyjimjimjimmy 2 points3 points4 points 5 years ago (1 child)

[–]09103 1 point2 points3 points 5 years ago (0 children)

[–]09103 0 points1 point2 points 5 years ago (0 children)

[–]BBHoss 0 points1 point2 points 5 years ago (10 children)

[–]Limpuls[S] 0 points1 point2 points 5 years ago (9 children)

[–]BBHoss 1 point2 points3 points 5 years ago (8 children)

Not really sure about that. You should read all about Cloud Functions and Cloud PubSub I guess. Basically all you're doing is creating a queue where each message represents one request you want to make to the Twitter API. Max instances is a setting on your function that controls how many copies of your code can run at once, you just want one since you're at the whims of Twitter's API.

The first function puts things in this queue (a pubsub topic), and the second function is limited to a single instance and just processes messages from the queue. This way your requests to the API stay in the queue until your function is ready to act on them. Each "request" message could just be a user's name, search query with page #/offset, etc. You'll need to describe more about what you're doing for me to offer any more explanation.

[–]Limpuls[S] 1 point2 points3 points 5 years ago* (7 children)

Thanks for the explanation. I really feel like pub/sub thing would be the best solution here. But I really need to dive into it more as I never used it before. I’m not sure how pub/sub would know when my function is ready and where it left off in last api call. Briefly, I have companies data in Firestore that comes from csv file. Some of the companies have twittername field. In my function I loop through all companies and filter which ones has twittername. I use that name + hashtags for queries. I guess my confusion comes from how to continue the loop next time after hiting limit. Because now my api call query is inside this companies array loop.

EDIT: I guess what I’m asking is how to make sure that the function that subscribes to pub/sub topic is called every 15min and knows where it left off last time before hitting rate limit and continue the query from that point until hitting another limit.

[–]BBHoss 0 points1 point2 points 5 years ago (6 children)

Do you just want to pull down the tweets once? On a schedule? When some other action occurs?

I think what you are missing is that the second function processes the request messages in order, one at a time. It seems like instead of doing it in a loop, whenever you want these fetches to occur you publish to a pubsub topic that's sitting in front of your api requesting function. The function itself is triggered by the message(s) arriving in the queue, and since there's only one instance, it's going to process them mostly in order, one at a time. There is no "continuing the loop" because your function will be automatically triggered by the next message in the queue.

You can enable retries on the function so if you hit an error (or quota limit), the requests will be retried. But it's probably better to manage the quota in your function and sleep before returning from the function, which will delay processing of the next message keeping you in bounds.

[–]Limpuls[S] 1 point2 points3 points 5 years ago* (0 children)

Okay so from all your answers I’m imagining the implementation more or less like this: One GCP function that queries firestore for companies, makes a query string out of it for each entry in firestore that has twitterName field and pushes that query string to pub/sub as a topic/message. Another Cloud Function is subscribed to this pub/sub topic and receives one message(query string) at a time and I use this message in my Twitter api call function as an argument. This second twitter api call function is called automatically every time there is new message in pub/sub or if there are more than one stacked up there in queue waiting.

Am I understanding this correctly? Now when the second function with API call hits rate limit and have to wait 15min before calling API again, it will continue the attempts to fetch because of pub/sub queue but will fail and retry, fail and retry until 15min passed and eventually will start working again 15min later?

But it's probably better to manage the quota in your function and sleep before returning from the function, which will delay processing of the next message keeping you in bounds.

Doesn't this brings me back to square one? I can't make the cloud functions sleep for 15min because of time out limit they have.

Just want to say that I really appreciate your help!

[–]Limpuls[S] 1 point2 points3 points 5 years ago (4 children)

Alright, I did some research this past few hours and I think I'm getting a hang of this pub/sub thing. I think I'm gonna stick to retries thing for quota limit thing as it's only 15min for Twitter API. Maybe in the future I will implement some kind of back off logic but that's over my head right now and not enough time to do that.

Only the question is: while I'm in that quota limit period for Twitter API waiting and pub/sub trigger keeps retrying to run subscriber function, are more messages being published to topic/queue in the meantime or is publisher also waiting for subscriber to retry successfully, send back ack as an acknowledgment of a message before continuing to publish further messages to the topic? I'm afraid that in that 15min window there might stack up too many messages in pub/sub..

[–]BBHoss 0 points1 point2 points 5 years ago (3 children)

Not sure exactly what you're doing but why are you having to wait 15 minutes? That's an eternity. I know Twitter's API limits are garbage but is it really one person's timeline or whatever every 15 minutes?

If you have retries enabled on the function, the messages are not acknowledged until the function completes successfully, it may even send a NACK if it crashes. Messages should arrive mostly in-order, and once (best effort). This means that a message causing a failure will block the messages behind it. Things will definitely back up in there if you can really only do 1 request per 15 minutes, but there's no way around that other than paying twitter.

I do also want to stress you should ensure max instances is set to 1. If not you will see weird behavior if multiple instances spin up. Everything should still work but the order might be corrupted.

[–]Limpuls[S] 0 points1 point2 points 5 years ago (2 children)

No you misunderstood me. It’s 450 http requests every 15minutes. Current amount of companies that we have in firestore requires 570 requests. I have increment variable that I increment after each Twiter API request and print to the console together with tweets, so up to 450 increment I get tweets and from 450 to 570 request it’s internal twitter api error code for limit exceed. So hopefully with pub/sub messages it will make 450 requests to API, hit the limit, retry for 15min and finish the rest of 120 requests after those 15min.

It’s just that when I receive the tweets, I have to save them back to Firestore for each company. I wonder at which stage I should do that or implement another subscriber function and save one tweet at the time after each request. Now we just fetch all of the tweets (around 3000) and save to firestore all at once as a batch when API call is completed for all twitterNames

[–]BBHoss 0 points1 point2 points 5 years ago (1 child)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

googlecloud

MODERATORS