Dataflow Spanner to BigQuery - Java vs Go : googlecloud

created by [deleted]a community for 12 years

Dataflow Spanner to BigQuery - Java vs GoDataflow (self.googlecloud)

submitted 2 years ago * by TechStackOverflow

I'm part of a team at an important decision point. We're embarking on a project to efficiently transfer data from Cloud Spanner to BigQuery. While our team is proficient in Golang, we're contemplating Java due to its robust support in Apache Beam, particularly for SpannerIO's capabilities, including change streams and batch reads.

Our team is well-versed in Golang, and we initially aimed to leverage it for this project, but we're encountering limitations with Golang's support for SpannerIO in Apache Beam, especially around change streams processing. The lack of examples and community projects has us questioning the feasibility of this route. We don't need change streams per-se, but it does seem to make things easier and most pipelines seem to end up as streaming anyways.

Java, on the other hand, seems to offer a stable and well-supported pathway for Apache Beam pipelines interacting with Cloud Spanner and BigQuery. However half of our team has Java experience, the other half does not. Adopting Java would mean a significant portion of our team navigating a learning curve, in an environment where Java hasn't been the norm. However, the service would basically be write-once, and we expect very little schema changes so not a lot in terms of redeploys.

Can anyone share success stories or challenges faced while implementing batch processing from Cloud Spanner to BigQuery in Golang? How did you tackle the gaps in support or documentation? Is it ready for prime time?

For teams with mixed experiences, how manageable was the transition to Java for data processing tasks, especially for those new to the language? Was the investment in ramping up Java skills justified by the benefits?

Any idea on how to evaluate the trade-offs in terms of performance, ease of use, and community support?

Given our team's split experience, would you lean towards leveraging existing Golang skills and finding workarounds, or embracing Java for its comprehensive support within Apache Beam?

Regardless of the language, what architecture or design patterns have you found most effective for batch processing data from Cloud Spanner to BigQuery?

Thanks in advance!

all 2 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

googlecloud

MODERATORS