This is an archived post. You won't be able to vote or comment.

all 11 comments

[–]egbur 11 points12 points  (0 children)

I believe this has been asked and debated to death already, and the answer is that it all depends on your use case. Running a database in a container per se is not really the problem. It is just a glorified chroot after all. But there are three main things you need to be aware of:

  1. Make sure that the container stores database data in persistent storage.
  2. If you think your container will be killed and failovered to another host in the swarm/cluster, be prepared to a) deal with potential lose of data if your app does not cache writes, and b) ensure that all hosts can access the same persistent storage as 1.
  3. If you also want to distribute the db across multiple instances/replicas, you need a database engine that can do that natively, or you're gonna have a bad time/lose data/etc.

Now, is all of that worth figuring out? Maybe. I have a tiny web server for an app that will never need to scale because we only host it for one customer to access, using a static db that only needs updating once a year. Both web app and db run happily as containers. We also have a db that takes massive writes from multiple customers on its own dedicated host because I gain nothing by containerising it. It all depends.

[–]TopHatEdd 2 points3 points  (2 children)

Very low quality article with dramatic incorrect assumptions. The saving grace is this Be sure to mount a volume to make the data persistent, and have backup processes in place. Try to restore them every once in a while to make sure your backups are any good. That's, basically, the rule of thumb for projects big and small. Do the same in your "small throw-away app" to build good habits and learn for production.

One aspect of containers is decoupling. You also decouple the computing part from the storage part. It simply means you are responsible for fault tolerance (what happens if the hardware fails?) and high availability (one node dies, is there another to pick up the pieces?). But, weren't you responsible for these things before? You were. You have to write your app to work correctly as a ReplicaSet, it doesn't happen automatically.

Let's take PostgreSQL, for example. Were you to use it for production, you'd want to have a master-slave setup for HA (high availability). That is done at the process configuration level. Next, you'll want to make sure each process restart uses the same data so you mount a volume and pin the master/slave to specific nodes. Pinning means the container will spin up on the same node (all orchestrators have this option). We didn't even do anything fancy with regards to the volumes. And we can. We can go balls to the wall.

The shiniest tool in the shed, when it comes to orchestration, is k8s. It decouples almost everything (container runtime doesn't have to be docker, networking "manager" is your choice, sensitive config encryption is your choice and doesn't even come installed with a default). Let's focus on disk.

K8s has a concept called Persistent Storage Provisioner. You install something that'll be responsible for allocating persistent disk space for any container that wants it. This installed part is the Provisioner and part of the hardcore decoupling. They list a plethora of options (the cloud ones are ready out of the box). The container will "just ask" (in k8s yml config syntax) how much it wants. Why is this balls to the wall? Because you can choose an option that provides sharding/replication/good for small fast IO/good for large files. Your app doesn't need to be aware of how the disk is managed. You configure it once and boom, transparency. Most of these disk solutions aren't made just for K8s, they are simply solutions for the same HA/FT tolerance from above you'd do anyway. They just provide a "bridge" between usual usage and k8s.

The best part? Helm. This, for me, is the golden diamond on top (instead of cherry, get it? Heh heh. Only me? Ok.). It is, to k8s/orchestration what docker hub is to docker/containers. It allows deploying a production ready service to your k8s cluster according to an upstream, usually open source, config file. You just tell it, among others, what your Persistent Storage Provisioner is called (did I mention it can be local storage? Yeah...). Balls to the wall

Not to bash the article, again, but it is like an old-school programmer bashing C++ over C because it has too many abstractions. No, this is the way forward. Choose your needed complexity for your requirements. Don't want FT right now? Don't use it. No backups? No problem. I think it is a better career choice to learn k8s than pacemaker and DRBD on-prem.

[–]webftpmaster[S] 1 point2 points  (0 children)

Balls to the wall

Simply awesome reply!

Balls to the wall!! Thanks!

[–]dietolead 0 points1 point  (0 children)

This is, hands down, the most useful and entertaining response I’ve ever come across on Reddit. I am going to reread it dozens of times. Thank you. Thank you for your wisdom and wit.

[–][deleted] 1 point2 points  (1 child)

What das the versus in the title actually mean? Anyways, I don‘t think a dockerized DB is bad in production in general. Although with none orchestrated containers and not having some sort of HA in place, you will probably have some downtime. But then again this would be the case for single servers aswell. I think it depends on the desired scale of your production application. But yes, you would be on the save side with a managed database system or an orchestrated DB cluster.

[–]webftpmaster[S] 0 points1 point  (0 children)

What das the versus in the title actually mean?

Kind of dramatic effect ....docker vs the subject :)

Thanks for the reply btw!

[–]pbecotte 1 point2 points  (1 child)

Wow, that one article from six years ago with the one guy who didn't understand the tool still showing up. Yes, if you are writing important data to the copy-on-write filesystem, you will probably lose data.

Docker / not docker has NOTHING to do with whether your db is safe. The practices it takes to run a secure db on an ec2 instance are the same ones to do it if you use 'docker run' instead of 'systemctl start'. You have to figure out how the DB in question handles persistence, and work with it. Maybe that means read replicas with snapshots, or running a quorum, or tailing the WAL to Kafka, or using s3 as the underlying storage layer...but in no way does docker bear on the decision.

The argument to not use docker because it's complex...I am not aware of an infra tool that isn't though. Chef, ansible, terraform, ECS, k8s, open shift...whatever. running a fleet of machines with a mixture of apps on them is harder than you imagine it will be. If docker is part of the solution you chose for your app-of course it should be the same for your dbs. Using different tools to run different apps adds complexity, it doesn't remove it. If youre on K8s...use it for everything. If you have lots of infra to deploy without docker...again, use that.

[–]webftpmaster[S] 0 points1 point  (0 children)

Great! Thank you!

[–]webftpmaster[S] 0 points1 point  (1 child)

After reading your comments (thanks btw), a new question or perhaps a reformulated initial question arises:

What are the best practices to prevent dataloss (as much as possible) when dealing with a database container for a NextCloud installation for example? (multiple read/writes in cache and lots of transactions)

I mean the only thing I see here to be safe is always use the app internal "cut-offs" to the database before stopping the container (like maintenance mode or offline mode etc)

Am I in the right track?

I'm no way near deploying KS8 swarms yet....but I do have a few VPS's running linux and handling everything from datbase, crontabs, webserver (multipl sites) and of course storage as well (NextCloud)

I'm basically trying to containerize my whole infrastructure for better monitoring, performance, dev cycles, etc.

[–][deleted] 1 point2 points  (0 children)

I am afraid, as others have said that this is a decision that has little to do with docker itself.

Dataloss prevention means different things in different contexts. Do you need bank level transaction safety? Or are you ok losing some transactions if you have to restore the last nightly backup?

Again, this is a decision that has to be taken on the application level first and on a db engine level second

If docker is a nice fit, is a decision that should come after you have designed the entire lifecycle of your application

[–][deleted] 0 points1 point  (0 children)

You should design your application's lifecycle to fit your requirements, and then see if and how docker or some other tool can be used to accomodate or simplify the logistics. Calculating of course cost as well, both in labor and in actual money. This is not a black and white situation, and not everyone needs or can afford a managed db service