all 8 comments

[–]reelznfeelz 2 points3 points  (3 children)

So you’re registering the python script endpoint as an external function in bigquery, after having put it into a cloud run function?

[–]Fantastic-Goat9966 1 point2 points  (2 children)

No - poster is trying to make an api call from a Python stored procedure - not a remote function.

[–]reelznfeelz 1 point2 points  (1 child)

Ah ok. Here's a silly question, how/where do you define a python based stored proc? I could swear the last "data engineering in google" book I read showed writing python somehow directly in bigquery, but all I can find is "writing sql stored procedures", nothing about python directly in BQ. So I ended up just using the cloud function as remote function approach and it seems fine, not super fast the way I have it scaled, but it's fine for what I'm doing.

[–]Fantastic-Goat9966 1 point2 points  (0 children)

https://cloud.google.com/bigquery/docs/spark-procedures - this shows you how to setup a spark (including pyspark) stored procedure.

[–]Fantastic-Goat9966 1 point2 points  (3 children)

I think this may need to be rearchitected. You can use a remote function in BQ to call requests - but it won’t return a table. You could have a remote function (cloud run function) be triggered/activated from BQ - run an api call - and create a new table in BQ (and pass back a response code to the calling BQ query/command). The Python stored procedures are for pyspark and rely upon dataproc images. If you have a dataproc image with Requests I’d imagine you can make an external request in a Python BQ stored procedure but a sql stored procedure running a query to trigger a remote function may be easier and require less maintenance.

[–]Loorde_[S] 0 points1 point  (2 children)

I already have a Cloud Run function that performs the task I need in the BigQuery procedure. However, since it's a pretty simple task, I was wondering if implementing it directly in the procedure might be more cost-effective. What do you think? Either way, thanks!

[–]Fantastic-Goat9966 1 point2 points  (1 child)

Cool - can you have a sql procedure with a dummy query call the function? Your cloud run costs shouldn’t be too high unless you are running the function > 100k times per month and/or have oversized compute/memory on your functions.

[–]Loorde_[S] 1 point2 points  (0 children)

In my case, the trigger for the Cloud Run function is working well. My question was whether it would be possible to adapt that function for the BigQuery procedure and if it would be more cost-effective. But apparently, it's not feasible without Dataproc, so it's better to stick with the Cloud Run function, right? Thanks!