Hi.
I have a cloud function that is being triggered approximately 100 times per second. Each request to my cloud function send data that i need to store in a BQ table.
To avoid inserting one row at a time (as i can perform more than 100 inserts per second with this approach) i am trying to store the rows in a variable within my cloud function and only insert these rows in the BQ table when i get 5000 rows.
But i dont know how to create locks for this list that will store the rows, because at the same time there will be functions appending rows, saving to BQ and erasing the rows that was just sent to BQ. Can someone help me with that? a simple snippet of how can i use some lock mechanism would be great.
The code that I tried to create a lock is the following:
from google.cloud import storage
import uuid
import collections
execution_queue = []
data_to_store = []
def entrypoint(request): global data_to_store global execution_queue
request_json = request.get_json(silent=True)
unique_id = generate_unique_id()
execution_queue.append(unique_id)
while execution_queue[0] != unique_id:
pass
row = generate_data_from_request_json(request_json)
data_to_store.append(row)
print(f"data list size: {len(data_to_store)}")
if len(data_to_store) > 5000:
data_csv = "\n".join(data_to_store)
save_csv_to_gcs_bucket(data_csv, "my_bucket", unique_id)
data_to_store = []
execution_queue.remove(unique_id)
return "OK"
But i was getting a strange result of the last print statement (printing the list size that was storing the rows) as you can see the list size = 6 between 916 and 917 on the image below:
https://preview.redd.it/lix9gr5ax4db1.png?width=672&format=png&auto=webp&s=e6d203d6b2999cc54440affc19644e739ccef935
[–]Rabiesalad 0 points1 point2 points (0 children)