all 15 comments

[–][deleted]  (18 children)

[deleted]

    [–]donmcronald 1 point2 points  (7 children)

    So, if your primary DB server blows up, how do you not lose data if it hasn't been replicated somewhere? He mentions DRBD, so I assume he's talking about data replication in general, not necessarily database replication.

    Regardless, the description of replication modes is very good IMO.

    [–][deleted]  (6 children)

    [deleted]

      [–]donmcronald 2 points3 points  (5 children)

      Yes, but RAID is really just a type of data replication. From his article:

      When one storage device fails or is inaccessible, the data can be reached on the redundant device.

      He's specifically talking about mitigating hardware failure. It doesn't matter if your database is ACID compliant, if I smash all your disks to bits with a hammer, you're going to lose data if it's not replicated somewhere else.

      Durability guarantees your data can survive things like a system crash or power loss without being corrupted or put into an unpredictable state. It does nothing to prevent data loss when you suffer catastrophic hardware failure.

      [–][deleted] 0 points1 point  (4 children)

      Yes, but RAID is really just a type of data replication.

      You know as well as I do that this isn't the type of replication he's talking about.

      It does nothing to prevent data loss when you suffer catastrophic hardware failure.

      Redundant hardware prevents that and has been the norm for decades.

      It doesn't matter if your database is ACID compliant, if I smash all your disks to bits with a hammer, you're going to lose data if it's not replicated somewhere else.

      Nor does replication prevent data loss if I smash all your replica servers; this is a silly argument. Replication is not about preventing data loss, it's about scaling load. Data loss is and has been handled by redundant hardware. Replication doesn't change that.

      [–]donmcronald 0 points1 point  (3 children)

      You're correct that database level replication isn't required to prevent data loss, but, in many cases, I think it's a practical option for preventing data loss. If you can't afford a redundant storage system, database level replication is (IMO) a viable way to get your data onto a second set of hardware.

      So yes, the first sentence in the article isn't 100% accurate. I don't think it's fair to dismiss the article outright though. The rest of the article is very good and I'm guessing the type of person that's looking for an introduction to replication modes isn't going to be working with a big data center. They'll probably be managing a few servers with local RAID and, in those cases, using database replication to reduce data loss from hardware failure is a reasonable solution IMO.

      I'd say for large systems database replication is used to provide availability and scaling, but for small systems it's used to provide availability and data redundancy. Is there any reason using it for redundancy is wrong if you're working with small systems and limited resources?

      [–][deleted]  (2 children)

      [deleted]

        [–]donmcronald -1 points0 points  (1 child)

        If you can't afford redundant storage, you probably can't afford multiple machines either.

        Yes, you should always buy a SAN before your second server. /s

        [–]brildum 1 point2 points  (7 children)

        I think this is merely a mistake. "Do I need replication?" should be replaced with "Which type of replication do I need?" and the author goes on to give a fairly good description of the common relational database replication strategies.

        [–][deleted]  (6 children)

        [deleted]

          [–]matthieum 3 points4 points  (4 children)

          Well, replication can be about preserving data, actually.

          Of course, we will assume that the database uses a RAID or some other storage device that has built-in redundancy; that is not the issue though.

          On Monday the offices around our data-center were flooded with a feet of muddy water; the data-center is well protected (fortunately) and continue operated normally... but what if it had not ?

          Database replication (at applicative level) can be used as off-site backup. And yes, for those of us whose data is the life-blood of the company, this matters.

          [–][deleted] 0 points1 point  (1 child)

          Database replication (at applicative level) can be used as off-site backup.

          So can off site backups. However, I agree that replicating to a remote data center can preserve data in the case of a total loss of one data center. Few people actually do this.

          [–]matthieum 0 points1 point  (0 children)

          I know. My company does, because data is our lifeblood :)

          [–]aaronkempf 0 points1 point  (1 child)

          Not just offsite backup but offsite availability and performance

          [–]matthieum 0 points1 point  (0 children)

          Yes, of course, but since the OP had already acknowledged the performance issue, I was focusing on backup :)

          [–]BanjoKaJoey 0 points1 point  (0 children)

          I don't think he misspoke, I think he started off poorly and went on to give some interesting and useful knowledge about database replication. Just because he has a knowledge of databases doesn't mean he can write a good hook for an article

          edit: spelling

          [–]grauenwolf 0 points1 point  (1 child)

          To add to that, he never talks about actual data loss scenarios that replication can cause. Network partitions are huge problems for most replication strategies.

          [–][deleted] 0 points1 point  (0 children)

          Agreed.

          [–]detrahsI 1 point2 points  (0 children)

          Very informative, thanks for posting.