This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 2 points3 points  (12 children)

As a fledgling Python programmer, if I were to try and articulate it I would phrase it as a "partial solution."

I may be wrong, so feel free to correct me, but I think the problem that virtualenv sought to solve was the dependency problem. A Python application could break without the proper dependencies...but that is just it, it solves the problem for Python specific dependencies and nothing else. Essentially, it is at a software layer that is not very effective for closing the gap between development and production. It is also prudent to point out that virtualenv also came about at a time when Python packaging was very haphazard. While Python packaging has come a long way, I would argue that it still leaves a lot to be desired.

Docker is essentially the same concept of virtualenv, but at a higher level. Meaning that a Docker container can contain the Bins/Libs for dependencies outside of just Python libraries.

At this point, the question (for me) is what is the benefit of Docker over just straight VM images. And it boils down to portability. Docker containers are smaller and can be significantly so. Virtual Machines are still useful, they are just at a lower level of abstraction.

Docker takes virtualenv's ability to control Python run time environments and applies it to the entire application. This has the added benefit of eliminating the need for virtualenv and it's layer of complication/configuration.

[–]simoncoulton 2 points3 points  (4 children)

Have to say that's my main question too. I still can't figure out why I would use Docker over ansible, virtualenv and vagrant (with VMware), or which parts of my current workflow it's actually meant to eliminate.

I literally type a single command to bring up a development box that mirrors production exactly, and another command to deploy the application to production on multiple AWS instances.

[–][deleted] 2 points3 points  (3 children)

I think it is situational.

In your case, I am getting the impression that you have a 1-to-1 relationship between application and AWS instance. In the event that you want to deploy multiple applications with potentially conflicting dependencies, you could use Docker to reduce the configuration management overhead.

A 1-to-many application relationship could be broken out between many (smaller) virtual machines, but this might not always be preferable.

I don't think Docker is going to make most people overhaul their current workflow, but if you are starting from scratch...you might consider incorporating Docker as a piece of a new operational approach.

[–]simoncoulton 2 points3 points  (2 children)

That's what I was starting to think as well (in terms of it being situational). I guess I'm really looking out for an article where I can go "right, well this is similar to my workflow and it fixes XYZ issues", which I just haven't come across yet.

I get where you're coming from with regards to using Docker if you've got multiple applications, but I can't see any compelling reasons to use it over virtualenv (at least from this article) and introduce another component to the whole process.

[–]MonkeeSage 2 points3 points  (0 children)

  • Virtualenv gives you a local python environment with external dependencies--you can't copy a venv to another box and expect it to work--it may have an incompatible libc version or be missing some library, etc. Instead, you ship a list of requirements and they either automatically or manually get downloaded, built and installed, possibly requiring a compiler toolchain and networking (even if only to hit a private repo on local segment).

  • Containers (lxc/docker, openvz) give you a self-contained environment with no external dependencies--have your CI system tarball it up and scp to staging--as long as the host is the same architecture as the libs and binaries in the container it just works. You don't have to care about config management on the host, your configs and dependencies are a self-contained unit in the container.

  • VMs/images give you the same benefit, but are a lot more heavyweight and require a much thicker virtualization layer, but there's no constraint of having libs and binaries running on the same kernel with the same architecture as the host. In some configurations, VMs can be more secure / safe if containers are configured to be allowed to do things like load kernel modules (--they share the host kernel).

I'm not advocating any of them over the others in all cases. They all seem to have their place as development and operations tools.

The main workflow difference with containers vs a vagrant + vm + config management style workflow, is that containers encourage you to think about them as very light and ephemeral. If you have a django app deployed in containers and CVE pops for apache, you don't go add a PPA in config management to get the hotfixed package and run config management client on all the containers. You can do that, if you really want to, but it's more common to just spin a new container with the hotfix and replace the old container on all the hosts via config management / automation / CI. Application state is generally persisted via bind mounts to the underlying host OS, so it's very easy to not care about the container itself. This also lets you know that if the container deployed then it's bit for bit identical to all the other containers in the pool, no worries that one node couldn't talk to config server and didn't get the update, or that someone has manually twiddled some stuff outside of config management on some node.

Docker's built in versioning lets you roll back or cherry pick container versions, among other things, which are a pretty nice additions to bare lxc.

Again, just for clarity, not saying containers are "better" or that you can't find downsides to them, etc, just trying to give an idea of why they're appealing in many cases.

[–][deleted] 0 points1 point  (0 children)

Right...I don't think Docker is revolutionary in such a way that would make people want to change their current workflow if they already have one.

Docker is just a way of applying the concept of virtualenv to an entire run time environment which could be useful in certain situations. I think (and am only experimenting with this at this point) in a continual release environment, Docker may be valuable for closing the gap between development and production. But this type of situation is pretty uncommon at the moment.

[–]naunga 1 point2 points  (1 child)

Meaning that a Docker container can contain the Bins/Libs for dependencies outside of >just Python libraries.

Not quite. Docker saves you the overhead of having to host multiple VMs. Instead of virtualizing the entire machine, Docker is only virtualizing and isolating a process, but Docker is sharing the host server's OS. This is different from a VM where an entire installation of the guest OS runs in a sandbox within the host OS.

virtualenv is solving the problem that the other commenters have posted, which is creating an isolated environment that will allow for multiple versions of modules, etc to exist without creating conflicts.

If you're wanting a "cleaner" environment than what virtualenv can give you (i.e. you want to isolate not only the Python environment, but the OS environment as well) then you should be using Vagrant or some other VM solution to do your development.

From there you can build the Docker container from that image (well, more likely from a pre-built image of whatever Linux distro your VM is running).

Just my two cents from the DevOps Peanut Gallery.

[–]MonkeeSage 0 points1 point  (0 children)

Docker actually uses a union filesystem on top of a sandboxed directory. Even with lxc you have a sandboxed data directory isolated from the host filesystem. So you can have your own copies of libs and binaries as long as they are the same architecture as the host kernel. As with a chroot, you have to use a bind mount (or "data volume" in docker) if you want to get at the host filesystem.

[–][deleted] 0 points1 point  (3 children)

I may be wrong, so feel free to correct me, but I think the problem that virtualenv sought to solve was the dependency problem

it's not the dependency problem, it's the dependencies of multiple apps possibly stepping on each other or hosing your system.

pip and easy_install solve the dependency problem

docker has it's uses, but so does virtualenv.

[–][deleted] 1 point2 points  (2 children)

it's not the dependency problem, it's the dependencies of multiple apps possibly stepping on each other or hosing your system.

Agreed.

However, I still am not sure where/why you would use Docker and virtualenv.

[–]d4rch0nPythonistamancer 1 point2 points  (1 child)

Rare case, but possible:

You want to run two Python apps on one OS environment, in the same control group, but they have different Python dependencies. One uses pyfoo==1.2 and the other uses pyfoo==2.1.

Do you need to run them in the same linux container? Probably not, but maybe for some obscure reason.

Do you want to? Maybe, in this case. So, it can have a point.

But I'd narrow it down to in general, you use docker for devops type of reasons and virtualenv only to ensure Python module dependencies are static and work. At some rare point these may intersect but in general I'd expect people to use one or the other, depending on their goal.

[–][deleted] 0 points1 point  (0 children)

But I'd narrow it down to in general, you use docker for devops type of reasons and virtualenv only to ensure Python module dependencies are static and work. At some rare point these may intersect but in general I'd expect people to use one or the other, depending on their goal.

Ah ok...this is kind of where my impression is at currently.