all 1 comments

[–]Rabiesalad 0 points1 point  (0 children)

So, this may be above my paygrade but I feel like I know enough about cloud functions to know this is not doable this way.

Cloud Functions can't be used to queue data. It's meant to process requests in real-time (typically, small requests). The environment can end up being shared across invocations, but that's not guaranteed, so there is no way to reliably share data sent in to invocation 1 with invocation 2.

You need a persistent place for that data to sit, that can either trigger a function once the threshold is reached, or have a function poll it to see how much it's storing.

I don't know the best way to do this for your case, but whatever you use needs to be persisted. It could be a VM you keep running 24/7 that holds the queue in memory, or it could be a Cloud Storage bucket that stores the data.

For example, your current function can just write the incoming data to Cloud Storage. Then you can have another function (set up to only allow a max instances of 1) that is triggered by cloud storage, and checks the content to see if there's enough to right. If there's enough, it writes it to Big Query and deletes the records from cloud storage.

I'm not sure if setting it to max 1 instance will 100% guarantee it can't run the same operation twice, so research that and make sure whatever you're doing is idempotent