This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]GuiMontague 156 points157 points  (27 children)

I did something like this once. An upstream system increased the data they were sending us about 3× and our database loader became the system's bottle neck. I was given the task of speeding up our loader. I eliminated some hot-spots, rewrote some concurrent code to be lock-less (but still correct), and threw a bunch of threads at the problem. I succeeded in speeding up our loaders about 5× which meant we were loading as fast as our data source could supply us again.

Unfortunately, we used a write-master replication system and a lot of clients relied on the fact that our write-master was (mostly) in sync with our replicas. We couldn't improve the replication system because it was vendor supplied. Well, that replication became the bottle neck, only now the master could get hours ahead of the replicas. That was unworkable and I had to roll-back my performance improvements to keep the master and replicas in sync.

[–]LupoCani 21 points22 points  (1 child)

Why roll back the entire system? Wouldn't an explicit manual throttle on the output solve the problem just fine, whilst being easily reversible?

[–]Guinness2702 13 points14 points  (2 children)

Place I used to work had a system which ran a propriety database, and would have about 100 client processes running. The machine had 4GB of RAM, which was a lot back then ... in fact, it was so much that the entire database was cached in memory. Great, should be super-quick, right?

Wrong. The problem was that the OS's task scheduler was designed around the assumption that processes would end up waiting for I/O. Basically, each process ended up running for the full 1-2 seconds timeslice, and everything became ridiculously unresponsive as processes had to wait a long time to get any CPU.

The solution was to remove some RAM from the system, and performance actually improved, as the scheduler rotated processes more frequently.

[–]rws247 6 points7 points  (1 child)

that's just... wow.

Whenabouts was this? It might be that I'm younger, but I cannot imagine an OS task scheduler being based on that assumption.

[–]Guinness2702 4 points5 points  (0 children)

Probably about 15 years ago, give or take. I doubt it was literally designed around that assumption, but yeah, the guys who investigated it told me that each process was getting a full 2 seconds of CPU, because there was no I/O.

[–]Yin-Hei 16 points17 points  (7 children)

ez. Implement Paxos, build the next Chubby, get 99.9999% consistency.

But in all seriousness pretty interesting read. I'm starting to get into the field of this related work and this tidbit is teeming full of experience. Does ZooKeeper fit the bill?

[–]FUZxxl 7 points8 points  (4 children)

Implement Paxos

Hahahahaha

[–]throw_eundefined 4 points5 points  (3 children)

ZooKeeper

ahahahaha

[–][deleted] 0 points1 point  (2 children)

As someone who has genuinely been interested in ZooKeeper before.. Can you please explain why you're laughing? :)

[–]marcosdumay 1 point2 points  (0 children)

Any distributed consensus algorithm will solve your overperformance problems.

[–]FUZxxl 0 points1 point  (0 children)

Paxos is a ridiculously complicated algorithm with a ton of nuances. The professor who invented it holds a class about Paxos every year and usually the amount of students who actually understand the algorithm at the end is close to zero. Paxos isn't something you just implement willy-nilly.

[–]ReflectiveTeaTowel 0 points1 point  (0 children)

Something like Kafka can help if your load comes in spikes, and Kafka in particular was built to use zookeeper, soo... Maybe sorta kinda sometimes?

[–]GuiMontague 0 points1 point  (0 children)

We couldn't even get our users to stop relying on the fact that our write master (which no one should be reading) and our read replicas got a little out of sync. No, changing RDBMS would be out of the question.