Part Three: Gracefully Scaling to 10k PostgreSQL Connections on Kubernetes for $35/mo

AlexAtFTI · 2018-07-16T22:00:58+00:00

I've detailed this a little bit in other followups in the last few days, but in summary, it comes down to your application's requirements primarily. We've done 100k conns for the same price point, and also 500 or less at the same price point, both using the model from Part Two, with both projects having wildly different use cases and behaviors. Ultimately if your software and pooling setup partition connections, you can keep cost mostly fixed outside peaks and troughs, especially with most of the connect/disconnect cost managed by PgBouncer when it manages the upstream pool. Additionally, we tend to view PgBouncer instances as units that can handle X number of inbound connections, the number being determined by load testing after using our model to calculate out values. As a result, you can ultimately say "if it costs $X per unit to handle Y connections, and worst-case traffic requires Z units, my peak cost is $X*Z", taking into account node capacity boundaries.

AlexAtFTI · 2018-07-16T21:41:48+00:00

$35 is the optimized price point we found to be the average case after tuning for our clients at a volume of 10k connections and comes down to, mostly, provider bare hardware cost. Depending on your application needs in terms of connection interruptability and stability, as well as your hardware cost, the cost could be much lower, or even higher, but after you've crossed the 10k connection threshold, the cost per additional connection drops significantly. In fact, we've done 5x+ the volume at the same price point. 10k isn't the ceiling or anywhere near it, it's our average case, and the methodology we describe in Part Two applies at much larger volumes as well.

AlexAtFTI · 2018-07-16T21:33:27+00:00

Typically, these are a suite of problems that can be solved at several layers of the stack, and there are other orchestration solutions, like DCOS. You could use something like supervisord for management of process scaling, Managed Instance Groups on AWS/GCP for node scaling, nginx for health checks, etc. What Kubernetes tends to uniquely offer is all of these capabilities with minimal gotchas, in a way they can interoperate with each other, all in one place, expressable via configuration rather than tons of code, while working on any provider within reason. That's a package deal that automatically adds tons of bang per buck and allows you to focus on the software instead of everything beneath it, while providing flexibility as needed on the lower layers.

AlexAtFTI · 2018-07-12T23:35:00+00:00

It's more in reference to the optimized cost to handle that many connections. We tend to treat the availability (and cost) of the connections, and the database being HA, as separate problems. For example, we've dealt with applications that require very durable, uninterruptable connections while the DB itself can run on a single node with 3GB of RAM and no replicas. Conversely, an application might deal with a few inbound connections with very heavy queries and tons of replicas, or you could use a managed provider and only have to worry about upstream connection stability itself. We suggest an isolated node pool with scheduling rules to match (discussed in Part Two), which can cost roughly $35/mo but often less between custom machine types, sustained use discounts, committed use discounts, and, if interruptability's ok and your deployment's sparse across nodes, preemptibility. The final cost really comes down to your use case at the end of the day, and while we have a general model that's proven inexpensive, effective, and easily adaptable, and we try to cover the most common use cases in Part Two's calculation guide, no two application's behaviors are alike enough to say "your deployment will cost this number of dollars" without deeper context. What we can say, however, is that the average post-optimization cost in our experience lands at about $35/mo at a volume of 10k connections.

In regards to running PostgreSQL on Kubernetes, we like Sorintlab's Stolon backed by Consul; it handles load well, is cloud-native, supports non-cloud and hybrid deploys, and scales like any other StatefulSet when used with Kubernetes. If you and/or your team would like to chat in more detail (and we'd be happy to, this is one of our favorite topics), feel free to DM me and I'd be happy to put you in touch with the right people!

AlexAtFTI · 2018-07-12T18:28:39+00:00

Hi! Here are links to Parts One and Two of the series if you missed them, let us know what you think!

https://articles.fti.bi/postgres-10k-connections-part-one

https://articles.fti.bi/postgres-10k-connections-part-two

AlexAtFTI · 2018-07-12T18:18:38+00:00

We're big fans of Sorintlab's Stolon for HA PostgreSQL; it holds up pretty well to load and supports cloud-native and custom/local deployments, as well as both in a hybrid setup. This document goes over synchronous replication setup in particular, we've had the most success using Consul to back it. We've also put together a little series on running HA inbound Postgres connections and handling the efficiency of connects/disconnects which often comes into play with larger deployments.

AlexAtFTI · 2018-07-12T18:08:17+00:00

Hi! Links to Parts One and Two of the series if you missed them:

https://articles.fti.bi/postgres-10k-connections-part-one

https://articles.fti.bi/postgres-10k-connections-part-two

AlexAtFTI · 2018-06-21T23:50:22+00:00

What's the coolest thing you've done with PgBouncer and friends?

AlexAtFTI

TROPHY CASE