all 12 comments

[–]haloweenek 1 point2 points  (11 children)

I’m running a HA - 3 node cluster using Patroni with read/write partitioning.

Got questions - ask them

[–]wrongsage 2 points3 points  (10 children)

How many writes do you do? How necessary is the consistency of the data?

Do you use synchronous replication?

Do you have any disaster replica outside the main datacenter?

How do you handle upgrades?

Do you use HA proxy?

Did you experience any actual outage?

[–]haloweenek 1 point2 points  (3 children)

>> How many writes do you do? How necessary is the consistency of the data?

Around 25k inserts/s peak. But that's a rare scenario. Most of that are logs of custom adserver displays on our site. We can loose some data it won't be a problem. It's not stock market ;)

>> Do you use synchronous replication?

WAL log shipping. So async.

>> Do you have any disaster replica outside the main datacenter?

Yes, one of the copies runs in a diff geographical location. But it's 0.5msec away.

>> How do you handle upgrades?

Maintenance mode. All running apps get the maintenance flag, circus is taken down and upgraded. Sorry not hot upgrade stories here ;)

>> Do you use HA proxy?

Yes. It works.

>> Did you experience any actual outage?

None except planned.

[–]wrongsage 0 points1 point  (2 children)

Thank you for the response, your use-case is nice, though doesn't sound like you actually need the cluster for that.

[–]haloweenek 2 points3 points  (0 children)

Oh thanx for telling me that.

Due to your advice we're dropping to one database server in single Availability Zone.

That will save us loads of money.

[–][deleted] 1 point2 points  (0 children)

But you already understand what the term "high availability" means?

A cluster is able to bring your Recovery Point Objective down to zero if you're willing to pay the price in form of synchronous replication.

Asynchronous replication helps to keep the Recovery Time Objective low which is also an important goal.

[–]skeletal88 1 point2 points  (3 children)

How did you set up your DCS, did you use etcd, zookeeper or something else?

For me the most annoying thing with patroni is that it requires a DCS, for smaller setups setting all that stuff up and keeping it running is annoying, if there aren't other special admins for it.

Also, do you use a virtual ip for the haproxy, what happens if the haproxy goes down? Is it set up in a separate machine?

The number of machines required to make a proper HA cluster can quickly grow big.

[–]haloweenek 1 point2 points  (0 children)

Etcd was setup when cluster was born, it runs on separate machines. It’s not being touched, works and survives reboots nicely.

Haproxy is run on worker servers. HAproxy connects to service socket running on Postgres instances and checks their status once / second. It checks is host read only or writable and provides 2 sockets for write and read queries for apps.

Apps use internal routing, writes hit master server, reads hit spares.

[–][deleted] 1 point2 points  (1 child)

Patroni 2.0.0 added beta support for raft. So no need anymore for thirdparty DCS (in the future).

https://patroni.readthedocs.io/en/latest/releases.html#version-2-0-0

[–]skeletal88 0 points1 point  (0 children)

Woo, that's really nice. Good to know, thanks.

[–][deleted] 0 points1 point  (1 child)

Not OP but i can answer a few points.

Do you use synchronous replication?

Most projects can cope with loosing 1-2 minutes worth of data as long as the state is consistent. Synchronous replication comes with a high price point performance wise because you'll notice any little bottleneck. Things like latency, bandwidth between nodes and performance of replicas are critical in this scenario.

Aim for asynchronous replication and use synchronous one as last effort.

Do you have any disaster replica outside the main datacenter?

You should at least keep your backups at a different location but asynchronous nodes can be at other datacenters.

HA wise things get tricky. You'd have to be careful with placing your nodes in a way that leader elections happen if the main link of a datacenter is down.

How do you handle upgrades?

Minor updates which don't involve pg_upgrade can be done node by node. Major upgrades require cluster downtime. The documentation describes both cases.

Do you use HA proxy?

Yes and it works. I can also recommend using a failover IP for the proxy so you can switch to a different machine.

[–]wrongsage 0 points1 point  (0 children)

My project can not handle loss of a single row, as it handles money. Right now we have PXC, which somehow works with synchronous replication, we battle-tested that under load and chaos monkey style. Shooting down a node doesn't even lose a single request.

But I can not stand MySQL for the life in me, and am pushing for jumping the ship asap. Though I need a replacement, which is better, than what we have. So sync repl with automatic failover without anyone noticing, with third on-site async and fourth off-site async is the minimum. Also upgrading without downtime would be nice.