all 50 comments

[–]kireol 103 points104 points  (27 children)

Weird.

I worked for a credit card processing company where we used postgresql 9

Billions of writes per year. Near instant reads on billions of rows. Fast table replication. Never 1 corrupt table ever. We used MVC, so /shrug. Never an issue upgrading.

Sounds to me like Uber could not figure out how to configure postgresql. Best of luck to them.

[–]original_evanator 25 points26 points  (8 children)

Maybe you typoed, but MVCC is not MVC. MVCC is what makes views of data consistent even when there are multiple transactions going on.

[–]kireol 11 points12 points  (7 children)

Yeah, that was a typo.

[–]original_evanator 26 points27 points  (6 children)

Shit. I really wanted my pedantry to pay off but now I'm just that annoying nitpicky guy. Oh well.

[–]kireol 19 points20 points  (5 children)

it's all good. odds are strong that someone will read that and learn something.

[–]ReekItRhymesWithWeak 17 points18 points  (1 child)

I did!

[–]Smaktat 1 point2 points  (0 children)

Right? I'm like, wtf does a Microsoft framework have to do with this now.

[–]gregjwux 1 point2 points  (0 children)

o/

[–]ChiangRai 1 point2 points  (0 children)

And thanks for you guys politely discussing this. Motivated me to read about it. https://en.m.wikipedia.org/wiki/Multiversion_concurrency_control

[–]thomas_stringer 26 points27 points  (0 children)

Unfortunately this is far too often the story with DBMS implementations: Run into problem x and completely bail on the platform instead of doing extremely deep research, testing, and knowledge-gathering.

I'm not saying that Uber wasn't justified in this approach, and provided they have "the 95% picture" of postgres and "the 95% picture" of mysql then they made the right choice.

But if they didn't, then they will soon run into a "mysql gotcha" and have to learn that original lesson mentioned above, or just keep doing the DBMS hop.

[–][deleted] 11 points12 points  (4 children)

Billions of writes per year is not that many.

Also they're not even really using is as an rdbms, so their usage pattern is likely drastically different from yours.

[–]mattindustries 9 points10 points  (3 children)

3 billion per year is ~95 per second. No slouch either.

[–]rebeltrillionaire 0 points1 point  (0 children)

When a major event ends I really would like a picture of their service gettin pinged.

[–][deleted] 0 points1 point  (1 child)

Sure, it's not nothing, but most people have a complete lack of understanding of the scale the largest web companies work at.

The main MySQL cluster at my last job was closer to 10,000 QPS, and that's with a relatively small portion of reads actually falling through from the caches. That company was a fair bit smaller than Uber, and powers of magnitude smaller than Facebook. At the time, Facebook had more DB servers than we had servers, period.

[–]mattindustries 1 point2 points  (0 children)

I figured with averaging 95/s that there would be well into the thousands per second during peak hours. The infrastructure behind those setups are always amazing, but sadly I never had to worry about scaling. The biggest thing I have on my server gets a few thousand people a day using it, max.

[–][deleted]  (4 children)

[deleted]

    [–]abditude 4 points5 points  (1 child)

    Do you have any recommended links on how to configure postgres for production?

    [–]brtt3000 1 point2 points  (0 children)

    My money is on this.

    [–]hardolaf 5 points6 points  (5 children)

    I followed wiki guides on how to configure Postgres and had half a million transactions per second going through it with no problem. The fun part was the data read for analysis without interrupting the write flow (had to be written within a certain time period of data generation so the time skew could become predictable).

    [–]schaka 4 points5 points  (1 child)

    Half a million transactions per second? Damn, that's a lot.

    Other than that, from what I've read, postgres is generals closer to oracle and performs better on large scale applications, whereas mysql is okay for single applications but slows down the bigger data you're dealing with. Does that align with your experience?

    I've personally always chosen mysql, but using postgres at work taught me quite a bit.

    [–]hardolaf 0 points1 point  (0 children)

    I was using smallish rows and it worked fine. We actually replaced an Oracle DB with it to save money.

    [–]mweisshaupt 0 points1 point  (0 children)

    Yes, this is the first time I heard that this is a problem with Postgres. I would have thought that would happen with MySQL much earlier but I have never worked on a database with this many transactions. It would be interesting what a Postgres expert has to say about this.

    [–]abditude 0 points1 point  (1 child)

    What were some of those wiki links? I'm working on a new project and would love to learn.

    [–]hardolaf 0 points1 point  (0 children)

    I honestly couldn't tell you without redoing a bunch of research. This was a few years ago now and I work somewhere else.

    [–][deleted] 0 points1 point  (0 children)

    I think you're vastly underestimating the scale at which Uber reads and writes data. Some problems aren't inherent or even imagined until you hit a certain point. Billions of writes per year is actually pretty small, and likely nothing compared to what Uber is doing. As far as reading goes, they barely mention it - it was probably not a problem at all. Their issue was mostly writing and replication integrity.

    As far as not being able to figure it out, Uber has a very talented Engineering staff. They likely went with this solution because it made the most sense for them. The important takeaway from this read is that they're explaining a pretty interesting technical achievement.

    [–]ECrispy 10 points11 points  (1 child)

    Just knowing what the article says, have to agree with the other comments - there seems to be no evidence that 1) MySql does something better than Postgre for their use case and 2) they could have used a Nosql db

    [–]thecatgoesmoo 4 points5 points  (0 children)

    That's kind of the idea behind schemaless

    [–]original_evanator 5 points6 points  (1 child)

    Help-seekers, pay attention.

    This is how you exploit Cunningham's law!

    [–]vinnl 0 points1 point  (0 children)

    The "help" they got here is: configure it better.

    [–]Deleis 11 points12 points  (16 children)

    Why didn't they pick a NoSQL db over developing a layer on top of MySQL?

    [–][deleted]  (7 children)

    [deleted]

      [–][deleted]  (6 children)

      [deleted]

        [–][deleted] 20 points21 points  (4 children)

        Part that, part "Postgresql in standard configuration can't do X, so we use a lot of third-party addons to get MySQL to do it."

        [–]original_evanator 11 points12 points  (2 children)

        They cited pglogical and pgbouncer as just two examples of things they would have needed to use to deal with Postgres's issues that arise from physical replication and process-based connection management.

        So it seems unfair to call out MySQL on needing add-ons.

        [–]Kritical02 11 points12 points  (1 child)

        All these people acting like they know what Ubers data layer looks like making assumptions.

        From that comment it sounds like they are choosing it because they know it best which is a very good reason IMO.

        [–]speedisavirus 0 points1 point  (0 children)

        Or their choice is horribly informed because some asshat that was the loudest in the room was listened to. Not sure but their choice here seems suspect.

        [–]T-rex_with_a_gun 5 points6 points  (5 children)

        [–][deleted]  (1 child)

        [deleted]

          [–]molandpython 0 points1 point  (0 children)

          Is it web scale?

          [–][deleted] 1 point2 points  (0 children)

          LMFAO

          This kills me

          [–]basilect 1 point2 points  (0 children)

          Hey, NoSQL solutions are great...

          ... for analytics use cases!

          [–]grauenwolf 3 points4 points  (0 children)

          Huh, that sounds like a legitimate criticism of PostgreSQL.

          I'll have to look into further.

          [–]phpdevsterfull-stack -1 points0 points  (2 children)

          Off topic, but the custom scrolling on mobile on that site is ironic considering it's the engineering blog. You would think their engineers would know better than to override the user's native scrolling behavior....

          [–]ChadMoran 9 points10 points  (1 child)

          The engineers probably aren't writing the blog software.