I'd like to see a post/writeup (or even an essay!) on how to do the magic "backi...

ferrantim · on Sept 9, 2014

At the expensive of sounding like I'm just plugging my own company, this is what we're working on in the open-source project Flocker (https://github.com/ClusterHQ/flocker). We think that data-services like databases, queues and key-value stores, and anything else with state should be able to run inside docker containers too. Yes, you can already run a database in a docker container, but from an ops perspective, this is a nightmare and very far from what you want want to be able to do in a production system. Would love your feedback on what we're building and even more for you to get involved. Flocker is licensed under Apache 2.0, so feel free to get involved.

justinmayer · on Sept 9, 2014

You say that running a database in a Docker container is "a nightmare and very far from what you want to be able to do in a production system," but you do not explain why. Perhaps you might substantiate such a claim?

ferrantim · on Sept 9, 2014

This blog post about the problems running databases in container-based PaaS is a good starting point: http://blog.lusis.org/blog/2014/06/22/feedback-on-paas-reali...

Generally, you want to be able to answer these questions when it comes to operating your databases:

What are the failure points? What is the impact of each failure point? What are the SINGLE points of failure? What is my recovery pattern? What is my upgrade experience? What is the operational overhead in the applications running ON the product? What is my DR strategy? What is my HA strategy?

Pure Docker, and no other tool that we are aware of in the docker/container ecosystem that we are aware of provide really good answers to these questions when it comes to databases. That is what I mean when I said running databases in containers in a nightmare. It is possible today, for sure, but it is extremely complex operationally and its why it is so rare to see prod databases running in production today.

vidarh · on Sept 9, 2014

Nothing is stopping you from running MySQL in Docker on native xfs with aligned partitions: bind-mount whatever partition you want into the container by defining a volume in Docker.

This will be persistent, and will survive when you destroy the container. I use this to e.g. share a /home directory between a dozen experimental dev container I use to run my various projects - each container ensures I keep track of exact dependencies for each individual project, while I get to have a nice "comfortable" swiss-army-knife container with my dev tools and all all project files.

I also run a number of database containers which use volumes where I bind mount host directories to ensure persistence so I can wipe and rebuild the containers themselves without worrying about touching data.

pas · on Sept 9, 2014

Last time I tried was around 0.8, and aufs was not happy with xfs. I haven't tried having /var/lib/docker on ext4 and "/volumes" on xfs, I'll try if the need arises.

I run a few MongoDBs with volumes, but I'm not confident that I won't accidentally start two with the same volume, or that someone won't accidentally delete the volume, or .. or .. or.

As I've written to a sibling comment, I don't consider it a hard problem, but it hasn't been taken care of .. yet!

wastedhours · on Sept 9, 2014

Am intrigued (and, again, showing my current early-stage understanding of LXCs), can you link the same data store container to multiple application containers? As in, have both a beta application and production application pulling data from the same core DB?

And do you simply define the container as a volume to ensure it stays persistent? That was the feeling I got from the docs, but again, might just be flagging how little I know at the minute...

TheDong · on Sept 9, 2014

docker run --name=my-data -v /host/data:/container/data data-container

docker run --volumes-from=my-data app-beta-container

docker run --volumes-from=my-data app-prod-container

That would share the data store, however the real way you'd do this would be....

docker run --name=my-data -v /host/data:/container/data data-image

docker run --volumes-from my-data --name my-database database-image

docker run --link=my-database beta-app

docker run --link=my-database prod-app

Doing --link will allow those two containers to network-communicate and you should only be communicating with your database over the network anyways.

shykes · on Sept 9, 2014

Yes, you can definitely share volumes between any number of containers, it's a common usage pattern.

You don't need to make a container persistent: Docker, by design, will never remove anything unless you explicitly ask it to. If you want to separate the lifecycle of a directory within your container, so that it stays behind after you explicitly remove the container, or to share it between containers - that's when volumes are useful.

yebyen · on Sept 9, 2014

Unfortunately I haven't read enough 12fa, but I know I can address most of your questions with one factoid: Volumes. You are absolutely right that Docker containers are meant to be disposable, and should not contain backing data. That is what Volumes are for. I haven't done enough with volumes to give you a real primer on the use of them, but volumes can run on whatever backing store you want and they are not so intertwined with the container that they would be deleted along with it.

It looks like volumes have evolved significantly since the feature was introduced, you might want these links, sorry I haven't reviewed them myself:

https://docs.docker.com/userguide/dockervolumes/

http://crosbymichael.com/advanced-docker-volumes.html

(I actually do keep my backing data in the containers, we have institutionalized backups where all of the important data is already kept in git anyway, so instance clones are in fact disposable for me even though they have all of the important backing data in them.)

pas · on Sept 9, 2014

I'm familiar with volumes, and here's how I see the problem:

On a docker host you have the docker daemon, and whatever auxiliary stuff you need to orchestrate either the containers or the host (update docker itself, and so on), you have space for /var/lib/docker, and that's it. Volumes are always somewhere on /host/data. That means you have to make up a scheme and convention, and cook up scripts and add it to your already quite dynamic mental model.

If you go and want to manage volumes, you need something for that. And currently everyone and their cats have their own solutions (because there is one they claim to use and one they use, and one they hack on to use later). I'm not claiming it's a hard problem, just that it's not taken care of yet.

Maybe Flocker will deliver, I haven't checked it since it was posted 5 minutes ago :)

nl · on Sept 10, 2014

Yeah, the problem isn't that you can't do it. The problem is there are too many ways that mostly do what you need.

amackera · on Sept 9, 2014

Docker's design makes it incredibly easy --- or at least it makes it difficult to treat your backing services NOT like attached resources, therefore forcing you into some sort of 12factor-esque design. There's no magic to "backing services".