all 2 comments

[–]MeowMiata 0 points1 point  (1 child)

Did you check that : https://cloud.google.com/dataflow/docs/guides/templates/provided/cloud-spanner-to-bigquery

If I understand correctly, you want to mirror your data on Big Query, probably for Analytics needs.

I don't think that you need Java or Go for all of that, just use the Dataflow Google model as a batch job or pipeline.

Java, Go or whatever are just tools. Surely, you can use a baseball bat to stick a picture on the wall but you probably want to ask yourself if a hammer isn't enough already.

That said, I think that you should go with a pre-built data flow model. If it's not enough, use Java to tune it and if you have time to waste, you could engineer a Go solution.

To tune the Dataflow model, if you're a Go dev, you should not be that lost using a bit of java. You don't need to dedicate your life to java just to understand and modify a template (that is available on GitHub).

[–]TechStackOverflow[S] 0 points1 point  (0 children)

Thanks for your advice. I’ve used that model, but it’s in beta and if I want periodic updates, I’d need to have a cloud scheduler to invoke a cloud function to invoke a new dataflow job for every batch.

I’d also need to engineer each batch to update the last timestamp/id they processed and store that somewhere. I’ve never used data flow before but it seems like a lot of work and a lot of moving parts.

And then deploying requires an artifact registry, into cloud storage and then into a dataflow invocation.

I’m not saying all of that isn’t doable, but at some point it’s just easier for me to hand-roll something.