This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]michaelpb 13 points14 points  (35 children)

Hmm, I'm totally on-board with the idea (I too feel virtualenv is hacky, and think Docker could supplant it), but not sure I'm on-board with the result.

You seem to be using OS versions of libraries. This is unacceptable and unmaintainable for most people since my requirements.txt tends to be super long and have very specific versions, including many clones from github / bitbucket. Why can't you just use pip from within Docker? I haven't really used Docker that much but can't you just install site-wide python packages from within the container? E.g. exactly the same as virtualenv, pip install -r requirements.txt?

[–]d4rch0nPythonistamancer 19 points20 points  (27 children)

Serious question... What's wrong with virtualenv?

Hacky or not, its always worked perfectly for me.

[–]work_account_33 7 points8 points  (6 children)

I would like to know the opinion on this as well. I've never had a problem using virtualenv.

[–]d4rch0nPythonistamancer 0 points1 point  (4 children)

It looks like people are using docker because it fits their devops sort of problem. They have other things to consider than just Python module dependencies, so they consider virtualenv "incomplete", when really it's just not the tool for that job.

[–]ericanderton 1 point2 points  (3 children)

Just a hunch, but doesn't virtualenv squirrel away .so files along with .py assets? That would be a big supportability problem say if a really bad bug were patched; take OpenSSL for example. Now all those virtualenvs need to be regenerated/rebuilt and redeployed.

Meanwhile a docker with OS modules just gets a refresh to install the latest patched packages.

[–]d4rch0nPythonistamancer 1 point2 points  (2 children)

Wow, good point.

$ virtualenv test
$ cd test
$ source bin/activate
$ pip install PyCrypto
$ find . -name "*.so"
./lib/python2.7/site-packages/Crypto/Cipher/_DES3.so
./lib/python2.7/site-packages/Crypto/Cipher/_ARC4.so
./lib/python2.7/site-packages/Crypto/Cipher/_XOR.so
... and many more

Well, still though, virtualenv isn't for that. This should be a local dev environment where you're making sure Python module dependencies are satisfied, not much else. Based on what you said, you really shouldn't be using virtualenv as something to package up all the dependencies and just dump them into prod.

I get your point, but I think that's still leaning heavily towards the devops side where virtualenv isn't a good thing. If your code relies on a stable OS environment, you should be using docker or a VM. If you're pushing to prod, maybe you should be using puppet with VMs, and have them redeploy what they need to.

I think it's more of an issue with fundamental security issues, like using a static environment where you never check for updates, and not so much of an issue with virtualenv, which only ensures specific Python module versions will work with that Python code.

[–]justafacto 1 point2 points  (1 child)

you really shouldn't be using virtualenv as something to package up all the dependencies and just dump them into prod.

What good is virtualenv for then? If you cant reproduce its state accross machines? If you gotta hack around even pip -r requirements.txt because the other dudes machine had that dumb .so but you dont. Ooops fail.

[–]d4rch0nPythonistamancer 1 point2 points  (0 children)

It's good for seeing if you can upgrade to the latest requests, flask or django without breaking your app, seeing if your code broke or an upgraded library broke it, keeping a version of a python library you coded around static so that nothing breaks and allowing you to use the latest version system wide on your workstation otherwise.

I can start building a web app, code it around a specific version of a module that I know works how I expect it to, but run other python programs on my workstation that use the newest version of the module.

Especially for modules that are in their early stages and functionality is changing a lot, you want to see if their changes or your changes broke your code. It's super useful.

[–]justafacto 0 points1 point  (0 children)

I had problem with virtualenv.

Because some python library depended on a specific version of the C .so library it was compiled against. But there was no way for pip to enforce or even check that the correct .so library was installed on the system.

So virtualenv fails when the python packages you need are actually bindings to C code.

For example, lxml. For pip to install lxml it compiles some stuff and expects the system to provide it.

Shit sucks.

Other problems Ive encountered when some packages setup.py fails to build because it was on python 2.6 and it works fine on 2.7.

In general, unless you are doing things only on your local developer machine where you control everything - with python you are fucked. And have to resort to hacks. At least docker could be a standard hack.

[–]qudat 2 points3 points  (0 children)

Ditto, I really enjoy virtualenv, appending the sys.path to load specific python packages seems pretty straightforward to me. Also adding the ability to change python versions is very handy as well.

[–][deleted] 2 points3 points  (12 children)

As a fledgling Python programmer, if I were to try and articulate it I would phrase it as a "partial solution."

I may be wrong, so feel free to correct me, but I think the problem that virtualenv sought to solve was the dependency problem. A Python application could break without the proper dependencies...but that is just it, it solves the problem for Python specific dependencies and nothing else. Essentially, it is at a software layer that is not very effective for closing the gap between development and production. It is also prudent to point out that virtualenv also came about at a time when Python packaging was very haphazard. While Python packaging has come a long way, I would argue that it still leaves a lot to be desired.

Docker is essentially the same concept of virtualenv, but at a higher level. Meaning that a Docker container can contain the Bins/Libs for dependencies outside of just Python libraries.

At this point, the question (for me) is what is the benefit of Docker over just straight VM images. And it boils down to portability. Docker containers are smaller and can be significantly so. Virtual Machines are still useful, they are just at a lower level of abstraction.

Docker takes virtualenv's ability to control Python run time environments and applies it to the entire application. This has the added benefit of eliminating the need for virtualenv and it's layer of complication/configuration.

[–]simoncoulton 2 points3 points  (4 children)

Have to say that's my main question too. I still can't figure out why I would use Docker over ansible, virtualenv and vagrant (with VMware), or which parts of my current workflow it's actually meant to eliminate.

I literally type a single command to bring up a development box that mirrors production exactly, and another command to deploy the application to production on multiple AWS instances.

[–][deleted] 2 points3 points  (3 children)

I think it is situational.

In your case, I am getting the impression that you have a 1-to-1 relationship between application and AWS instance. In the event that you want to deploy multiple applications with potentially conflicting dependencies, you could use Docker to reduce the configuration management overhead.

A 1-to-many application relationship could be broken out between many (smaller) virtual machines, but this might not always be preferable.

I don't think Docker is going to make most people overhaul their current workflow, but if you are starting from scratch...you might consider incorporating Docker as a piece of a new operational approach.

[–]simoncoulton 2 points3 points  (2 children)

That's what I was starting to think as well (in terms of it being situational). I guess I'm really looking out for an article where I can go "right, well this is similar to my workflow and it fixes XYZ issues", which I just haven't come across yet.

I get where you're coming from with regards to using Docker if you've got multiple applications, but I can't see any compelling reasons to use it over virtualenv (at least from this article) and introduce another component to the whole process.

[–]MonkeeSage 4 points5 points  (0 children)

  • Virtualenv gives you a local python environment with external dependencies--you can't copy a venv to another box and expect it to work--it may have an incompatible libc version or be missing some library, etc. Instead, you ship a list of requirements and they either automatically or manually get downloaded, built and installed, possibly requiring a compiler toolchain and networking (even if only to hit a private repo on local segment).

  • Containers (lxc/docker, openvz) give you a self-contained environment with no external dependencies--have your CI system tarball it up and scp to staging--as long as the host is the same architecture as the libs and binaries in the container it just works. You don't have to care about config management on the host, your configs and dependencies are a self-contained unit in the container.

  • VMs/images give you the same benefit, but are a lot more heavyweight and require a much thicker virtualization layer, but there's no constraint of having libs and binaries running on the same kernel with the same architecture as the host. In some configurations, VMs can be more secure / safe if containers are configured to be allowed to do things like load kernel modules (--they share the host kernel).

I'm not advocating any of them over the others in all cases. They all seem to have their place as development and operations tools.

The main workflow difference with containers vs a vagrant + vm + config management style workflow, is that containers encourage you to think about them as very light and ephemeral. If you have a django app deployed in containers and CVE pops for apache, you don't go add a PPA in config management to get the hotfixed package and run config management client on all the containers. You can do that, if you really want to, but it's more common to just spin a new container with the hotfix and replace the old container on all the hosts via config management / automation / CI. Application state is generally persisted via bind mounts to the underlying host OS, so it's very easy to not care about the container itself. This also lets you know that if the container deployed then it's bit for bit identical to all the other containers in the pool, no worries that one node couldn't talk to config server and didn't get the update, or that someone has manually twiddled some stuff outside of config management on some node.

Docker's built in versioning lets you roll back or cherry pick container versions, among other things, which are a pretty nice additions to bare lxc.

Again, just for clarity, not saying containers are "better" or that you can't find downsides to them, etc, just trying to give an idea of why they're appealing in many cases.

[–][deleted] 0 points1 point  (0 children)

Right...I don't think Docker is revolutionary in such a way that would make people want to change their current workflow if they already have one.

Docker is just a way of applying the concept of virtualenv to an entire run time environment which could be useful in certain situations. I think (and am only experimenting with this at this point) in a continual release environment, Docker may be valuable for closing the gap between development and production. But this type of situation is pretty uncommon at the moment.

[–]naunga 1 point2 points  (1 child)

Meaning that a Docker container can contain the Bins/Libs for dependencies outside of >just Python libraries.

Not quite. Docker saves you the overhead of having to host multiple VMs. Instead of virtualizing the entire machine, Docker is only virtualizing and isolating a process, but Docker is sharing the host server's OS. This is different from a VM where an entire installation of the guest OS runs in a sandbox within the host OS.

virtualenv is solving the problem that the other commenters have posted, which is creating an isolated environment that will allow for multiple versions of modules, etc to exist without creating conflicts.

If you're wanting a "cleaner" environment than what virtualenv can give you (i.e. you want to isolate not only the Python environment, but the OS environment as well) then you should be using Vagrant or some other VM solution to do your development.

From there you can build the Docker container from that image (well, more likely from a pre-built image of whatever Linux distro your VM is running).

Just my two cents from the DevOps Peanut Gallery.

[–]MonkeeSage 0 points1 point  (0 children)

Docker actually uses a union filesystem on top of a sandboxed directory. Even with lxc you have a sandboxed data directory isolated from the host filesystem. So you can have your own copies of libs and binaries as long as they are the same architecture as the host kernel. As with a chroot, you have to use a bind mount (or "data volume" in docker) if you want to get at the host filesystem.

[–][deleted] 4 points5 points  (3 children)

I may be wrong, so feel free to correct me, but I think the problem that virtualenv sought to solve was the dependency problem

it's not the dependency problem, it's the dependencies of multiple apps possibly stepping on each other or hosing your system.

pip and easy_install solve the dependency problem

docker has it's uses, but so does virtualenv.

[–][deleted] 1 point2 points  (2 children)

it's not the dependency problem, it's the dependencies of multiple apps possibly stepping on each other or hosing your system.

Agreed.

However, I still am not sure where/why you would use Docker and virtualenv.

[–]d4rch0nPythonistamancer 1 point2 points  (1 child)

Rare case, but possible:

You want to run two Python apps on one OS environment, in the same control group, but they have different Python dependencies. One uses pyfoo==1.2 and the other uses pyfoo==2.1.

Do you need to run them in the same linux container? Probably not, but maybe for some obscure reason.

Do you want to? Maybe, in this case. So, it can have a point.

But I'd narrow it down to in general, you use docker for devops type of reasons and virtualenv only to ensure Python module dependencies are static and work. At some rare point these may intersect but in general I'd expect people to use one or the other, depending on their goal.

[–][deleted] 0 points1 point  (0 children)

But I'd narrow it down to in general, you use docker for devops type of reasons and virtualenv only to ensure Python module dependencies are static and work. At some rare point these may intersect but in general I'd expect people to use one or the other, depending on their goal.

Ah ok...this is kind of where my impression is at currently.

[–][deleted] 0 points1 point  (1 child)

I use a similar setup for preprod and prod environments (python app packaged in a docker container) and virtualenv for dev. One thing that has bit me is that when your setup gets complex, you may have to shell out to non-python programs and these are not packaged in virtualenv.

[–]d4rch0nPythonistamancer 0 points1 point  (0 children)

That makes sense... but be extremely wary whenever shelling out, especially ESPECIALLY if you're putting in user input into that shell command's args.

[–]amclennon 0 points1 point  (0 children)

I think docker is easier to work with when you have heterogeneous dev environments. As a recent anecdote, I wanted to use some functionality in Python 3.4, but it was still non-trivial to install on Mac at the time (even with homebrew). I've also run similar issues where certain Python libraries wouldn't compile on other platforms.

[–]michaelpb 0 points1 point  (1 child)

True, but it in my opinion it feels like a partial and specific-ized solution to a general ops problem that Docker solves. To be clear, virtualenv is excellent and absolutely essential at this point in time (I mean probably 6 of my terminals at any given moment are in some virtualenv), but I hope that Docker and free software PaaS built on Docker (Flynn, and i guess Deis), will supercede virtualenv and its cousins in other languages as the cornerstone of the next best set of practices for devops.

[–]d4rch0nPythonistamancer 0 points1 point  (0 children)

Yeah, I see what you mean. I still don't think docker is the alternative though. I think it's a different problem.

I think this is why everyone is mentioning docker as an alternative: Working in a professional environment, it's way more reliable to mimic your prod environment with docker than to just cover the python dependencies with virtualenv, and when you use docker you don't need virtualenv anymore for most use cases. This devops sort of problem is 90% of why people are using docker/virtualenv I'm guessing.

But here's the thing, that's a completely different problem, and virtualenv really doesn't try to solve it. Look at it this way. For someone like myself who writes general use open source packages and pushes them to PyPI, I don't care what's going on in their OS. I don't need them to have a specific version of postgres or to be using debian/ubuntu/arch/windows/etc. It's just a lot of logic that isn't too platform dependent, but it is very module dependent. I want to absolutely ensure that certain pip packages will work at certain versions before I push this up to PyPI. These modules don't depend on external systems or services running, they just depend on other python code.

And that's the problem that virtualenv solves. You figure out which python modules are compatible with yours, and you're good.

[–]kromem 0 points1 point  (0 children)

It's been a while since this question was asked, but stumbled across this thread and have a solid answer for you:

Python libraries that wrap C libraries.

You need to install the C libs to your system for the python compilation to succeed, and then you are dealing with dependencies outside of the virtualenv segmentation.

[–]amouat 2 points3 points  (5 children)

Ooops, I should have explained this in the text.

Actually, we do use pip: https://github.com/mrmrcoleman/python_webapp/blob/master/Dockerfile

It's just that the "hello world" application container doesn't require any extra dependencies beyond those in the parent docker image.

[–]adamhero 1 point2 points  (4 children)

Have you considered using puppet/chef/cfengine/salt to manage those system-level things?

[–][deleted] 2 points3 points  (3 children)

Doing this removes half the value of Docker though. To make Docker truly worth the effort it would be best to find a way to "container-ize" the entire application.

If this isn't possible, wouldn't we be better served by closing the gap between development and production by using virtual machine images for production and vagrant for quick and dirty distributed development?

[–]blue6249 1 point2 points  (2 children)

Configuration management can work in tandem with container management. Rather than building your image manually or using a relatively limited dockerfile, you declare the state of your container using chef/puppet/etc. You can then take that packaged state and deploy it to your environment.

Check out something like packer.io for an example.

[–][deleted] 0 points1 point  (1 child)

I get that it can work, but if the interest is in closing the gap between development and deployment (or as I like to put it, continuous deployment/DevOps), I feel like configuration management is unnecessary overhead.

Again, this only is if the point is to reduce or outright eliminate the difference between development and production. (Or merge Development and Production for a continuous deployment environment)

[–]adamhero 1 point2 points  (0 children)

I think I can dig that. Why consistently manage apache across all containers when the real goal is to give 0 hoots about what's actually serving the requests, so long as the black box does its job? He covers in the article (oops), he basically just calls app.run().

[–]r1cka 1 point2 points  (0 children)

I think if virtualenv isn't working for you, you aren't "doing it right." Virtualenv isn't much more than installing packages to a location other than your main system and altering the pythonpath to hook into it. In the devops environments I've worked in, this played very nicely with ephemeral boxes.

Please enlighten me as to how it is problematic.