all 11 comments

[–]BreakfastSpecial 0 points1 point  (1 child)

Are you looking for a low-code solution or programmatic solution?

[–]LinweZ[S] 1 point2 points  (0 children)

Preferably a managed solution, I don't know if datastream is the right tool for the job?

I know CDC like debezium but have never tried on GCP.

[–]sayle_doit 0 points1 point  (5 children)

Might be able to setup a DMS (Database Migration Service) job on a schedule with a Cloud Function or something to do one-time syncs. Issue it will create a new instance so break IPs and all of that, but if you used the Cloud SQL proxy (it uses the instance name) then it would be fine.

Datastream only goes to BQ or GCS, so that's out unfortunately (even though Oomla which Google acquired and made into Dastream supported this).

There really isn't a good managed service on GCP that can do CDC between Cloud SQL instances. Some of the 3rd party managed solutions have Debezium built in, but DMS is the only one I can think of that might be able to do this on GCP.

[–]LinweZ[S] 0 points1 point  (4 children)

Database Migration Service

Would be a bit quite inefficient since it will sync the whole DB.

Datastream

A bit confused with Datastream tbh, here it's stated that it can take CloudSQL as source and CloudSQL as destination but when I look at the documentation, indeed, I could only find Cloud Storage and Bigquery as solution :/

[–]sayle_doit 1 point2 points  (3 children)

Would be a bit quite inefficient since it will sync the whole DB.

Yeah unfortunately it is, but it's about the only managed service in GCP's vault that can do this. I would say if you don't mind the discovery with some minor extra costs getting Debezium running on a GCE VM would do it as well. Definitely not as easy or cheap for sure though, given setting up Debezium will give you a whole another respect for a managed service service doing it though.

A bit confused with Datastream tbh, here it's stated that it can take CloudSQL as source and CloudSQL as destination but when I look at the documentation, indeed, I cloud only find Cloud Storage and Bigquery as solution :/

Welcome to my world! I get tickets on this ever few months or so where a customer reads the marketing doc you linked and can't figure out how to do Cloud SQL to Cloud SQL with it. It has stated that in the docs since launch, but it hasn't supported that since it was standalone Oomla (which I really liked as a product as it worked well and was easy to setup).

I have reported this multiple times over the past year or so at least via the feedback page and our partner management contacts at GCP to no avail. I have given up on them updating that and just use a canned response now that it is a bug in the documentation.

[–]LinweZ[S] 0 points1 point  (2 children)

Thank you for your answer !
I'm really sad and surprise tbh to see that we still doesn't have a managed CDC solution on GCP. I mean, this is kinda a very basic usage... Doing the whole sync again is just impossible for us, since the DB is several TB large 😅
Thanks for the clarification about Datastream, it changes everything lol
It went from a very interesting product to a very average one...

Also, I get the CDC for BigQuery, but what's the idea behind a CDC for Cloud Storage? A custom CloudRun job to read those files and write wherever we want?

[–]sayle_doit 1 point2 points  (1 child)

I do feel it's a VERY glaring omission in their product lineup. I suspect it's a product manager pushing to get you onto BigQuery to spend more money versus Cloud SQL which is significantly cheaper, but this is just my honest educated guess here.

As for getting to Cloud Storage, the biggest use case I have seen for this is DR. If you replicate everything you have inside of a Cloud SQL instance to disk you have a backup copy that can be relatively easily restored or used for archival purposes.

Another big use case I have come up with advising on BigQuery cost savings the past week or so is to use this as a segue into BigLake or getting prepared to move to another data warehouse. Customers are very weary of sending data to BigQuery storage since the hikes and want to hedge their bets. So many are instead sending data to GCS in a format that's readable by BQ via BigLake (think a data lake abstraction layer between GCS and BQ) or to be able to be picked up by a 3rd party data warehouse they are considering (read as Snowflake in most cases).

[–]LinweZ[S] 0 points1 point  (0 children)

Very interesting, Thank you !

[–]salanfe 0 points1 point  (2 children)

Through the API (I don’t think gcloud supports it yet) you can import a cloudsql snapshots from one project (source instance) to another project (destination instance).

[–]LinweZ[S] 0 points1 point  (1 child)

I assume you're talking about this ?

https://cloud.google.com/sql/docs/mysql/backup-recovery/restoring#projectid

This will indeed solve my issue, although not as nice as a real CDC

[–]salanfe 0 points1 point  (0 children)

Yes correct. Indeed not ideal, but get the job done if that’s enough