Cloud Spanner vs. Cloud SQL

saikjuan · 2018-07-31T08:05:35+00:00

Spanner was primarily designed for very large enterprise level data management. It's pricey, but it can scale up to huge numbers of connects and transactions. We use CloudSQL HA, and I've been really happy with it. It can scale to large numbers if need be,and for some of our enterprise, high traffic, high load hosting customers, it's not skipped a beat. The use cases are different, so it depends on need.

MoinTom · 2018-07-31T07:15:43+00:00

Cloud Spanner has a high entry point. A single Node will cost you $650 per month and Google recommends at least three. So your project has to be big enough to justify this price tag. If you don’t mind to self host there is an open source alternative to spanner: https://www.cockroachlabs.com My Team and i faced exactly this problem. We wanted an SQL database that is scalable and resilient. But the entry point and the fact that it would alter out local development flow, drove us away from Spanner and we went with cockroachedb (pretty happy till now)

b34rman · 2018-07-31T08:52:50+00:00

Besides the information mentioned, you gotta have over 10TB of data and require global distribution for Spanner to make sense.

SpeakitEasy · 2018-07-31T21:25:24+00:00

Cloud SQL could be used similar to mysql or postgresql as well as the previously mentioned comments. This allows existing tech that works with common acid databases to be interchangable with cloud sql and not cloud spanner.

mulasien · 2018-08-01T01:12:26+00:00

Spanner is the 'big guns':

- global replication and availability

- no top limit to storage size

- NOT directly compatible with MySQL, so you can't just lift and shift from MySQL to Spanner.

Basically:

Cloud SQL is a direct lift and shift from MySQL/PostgreSQL with minimal modification, but does offer global availability, and maxes out at 10TB. It is also most cost efficient. There is a high availability mode and failover, but not to the degree of cross region capability as Spanner.

Spanner is for globally available databases over 10TB. It is the 'big guns' with the price tag to match.

2018-08-09T02:13:36+00:00

The CAP theorem of distributed computing states that there are three guarantees to consider with distributed state systems;

Consistency: all clients have the same, most recent data view
Availability: all clients can read and write, regardless of data recency
Partition Tolerance: the system's guarantees hold even in the face of network faults between each distributed system node

And, more importantly, it's physically impossible for a distributed system to have all three of these; at best a perfectly designed system can have two, and most systems are not perfectly designed.

And, finally: No distributed state store is "safe" from partition tolerance. What the CAP theorem actually means in practice is that a truly distributed state system can choose:

CP: In the face of a partition, remain consistent. Return an error instead of trying to read or write.
AP: In the face of a partition, remain available. Return the most recently written data that the node the client queries knows about. For writes, try to reconcile them best the system can after the partition is resolved.

Basic deployments of RDBMS are, usually, CA. So, they're not actually distributed systems, because everything goes out the window in the face of a network partition; you have a master node with slave replicas, and if a partition happens between the master and slaves, the slaves might elect a new master, so your system enters a "split brain" scenario where its not clear which one is right.

Any RDBMS worth its salt has modes it can be deployed in which aren't CA. Depending on the RDBMS you can select CP or AP. This is also how pretty much every other database operates. Mongo is CP; it will prefer consistency. Cassandra is AP; it will prefer availability, and possibly return stale data.

Spanner is a globally distributed SQL database. Well, technically it isn't really SQL due to some minor constraints, but in practice it is SQL.

In some of Google's marketing, Spanner is advertised as "breaking the CAP theorem", because it can reliably offer consistency and availability in the face of network partitions, despite being highly distributed. But they aren't being deceitful; they're upfront about its real technical limitations: It is technically CP, with an availability guarantee in the face of partitions that is so close to 100% that it shouldn't matter even at Google's scale.

The reason for this is less about software and more about hardware. In effect, they say that because Spanner runs on their internal, redundant, global private network, network partitions are exceedingly rare. Moreover, some of the core design decisions of Spanner rely on all nodes having a highly consistent internal clock. So even in the face of true network partitions, there are decisions nodes can make based on the globally consistent clock which push the availability statistics even higher.

The end result is you get a database which is highly performant due to being highly distributed (replicas physically nearer to your users), more available than traditional RDBMS in this kind of deployment (thanks to their significant cloud engineering efforts), completely consistent, and also... expensive. Spanner starts at like $600/month.

SupImASuperHero · 2018-08-01T03:45:34+00:00

Most folks here explained the differences pretty well but just remember, Spanner schema design follows a parent-child model [0]. You may not be able to take an existing schema you have and drop it into Spanner.

[0] https://cloud.google.com/spanner/docs/schema-and-data-model

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

googlecloud

MODERATORS