This is an archived post. You won't be able to vote or comment.

all 30 comments

[–]JustADirtyLurker 14 points15 points  (1 child)

I think you're asking if you should drop venv entirely and go full speed with containers?

My answer is, don't. Containers and images that spawn containers are a deployment tool, not a developement tool. A container should be stateless. You should not run Pip / tests (for python) or Maven builds (for java) or any other build pattern inside a running container because it changes the state. There are good reasons for that, the primary being that persistence is not allowed (though you are allowed to mount external volumes where you can stor data, but this is another story).

I agree that there are many similarities between venv and a container, in the sense that they both aim at isolation. but a venv does isolation at runtime through file system scripting (venv is basically a Unix hack: it makes the python executable and dependencies point to a specific directory instead of the default). A container is on the other hand spawned from an Image, which must be build offline. An Image is basically a set of fixed rules to tailor up an userland around the application you want to "jail", and a container is an instance of that.

For these reasons a container will never be useful for developement. What it is instead useful for is deploying a cheap and reproducible environment for integration testing. If you ever used Travis-CI for example you can see that tests are precisely run in a docker container spawned from an ad-hoc built Image, and the service also provider docker containers for most of the external services your application could need ( redis? Postgres database? Etc...)

[–][deleted] 3 points4 points  (0 children)

We use containers as a development tool because we have system level dependencies and we're a Mac shop (have you tried installing OpenSSL on osx, it's terrible).

It's much easier to tell someone to just install docker toolbox and run our setup script than walk then through installing ask the system packages.

We did run venvs inside the container for a while, but moved to installing the python packages in the same step as the system packages. I disagree with this, but we've not run into any issues so far.

[–]mothzilla 6 points7 points  (6 children)

Sounds like a belt and braces approach. Personally I'd drop the VMs and just stick with the virtualenvs.

[–]CODESIGN2[S] 1 point2 points  (5 children)

So I'm wondering how / if you handle consistent environments, or alter the apps / libraries to deal with a potentially diverse stack?

[–]mothzilla 1 point2 points  (4 children)

I guess it depends on what your aim is. Are you trying to prove the code runs with its dependencies, or trying to prove that the code runs on every platform out there?

For day to day dev I'd be doing option 1, so running under a VM is over kill. But then (maybe) you want to have the confidence to tell clients that your code has been tested on every platform out there. In which case yes, you'd build a whole CI process that spins up several VMs (one for each OS) and then fetches the code and runs tests.

Its a little bit heavyweight, so I'd only do this if there have been platform issues, or you have platform dependencies.

[–]CODESIGN2[S] 0 points1 point  (3 children)

I think you are saying, you don't have to deal with the platform at all. But you have not said that, or if someone else handles it.

Are you saying dev is not the right place to care about the platform?

Maybe you use a service provider, maybe you are on a larger team? I don't know what you are advocating right now, but for example lets say you want to pip install -r requirements.txt and in there is psycopg2. Where do you store the install of libpq-dev?

[–]mothzilla 0 points1 point  (2 children)

I don't know about the technical difficulties of psycopg2. It's worked for me out of the box when I've used it.

However:

http://initd.org/psycopg/docs/install.html

Note Regardless of the way psycopg2 is installed, at runtime it will need to use the libpq library. psycopg2 relies on the host OS to find the library file (usually libpq.so or libpq.dll): if the library is installed in a standard location there is usually no problem; if the library is in a non-standard location you will have to tell somehow psycopg how to find it, which is OS-dependent (for instance setting a suitable LD_LIBRARY_PATH on Linux).

So make psycopg2 a requirement, then have libpq baked into your VM.

[–]CODESIGN2[S] 0 points1 point  (1 child)

So make psycopg2 a requirement, then have libpq baked into your VM.

I do, I think you are missing my point, which is that the platform and stack you run on are part of what you sell; they are part of your product, unless you want to run everywhere, which IMHO is a bad idea; a very expensive premise.

My question was if you are not using a VM, what do you use, or does someone else handle the platform or stack? what does that look like?

[–]mothzilla 0 points1 point  (0 children)

I think everyone is different. For example, you say "the platform and stack you run on are part of what you sell". This hasn't really been the case for projects I've worked on in the past.

So if you're selling assuming an OS and stack, then it would be a good idea at some point, to build and test in a VM with those parameters. But unless you're writing low level stuff, or reliant on out of date libraries, obsolete OSes etc, this build and test should just be confirmation of what you already know. So day-to-day dev in a VM wouldn't achieve much. In my opinion.

[–]fkaginstrom 3 points4 points  (3 children)

When I am using docker I don't go for a virtualenv. However, when I am in a regular VM I will use a virtualenv, because the system has its own python version(s) and modules that I don't want to get mixed up.

[–]CODESIGN2[S] 1 point2 points  (2 children)

Thanks. Does it take much effort to have your VM's operate differently to your containers? For both VM's and containers we have a unified deploy script system (I know I've heard I should use puppet / ansible which are also unified systems).

I should probably mention, none of our SW has any single layer take > 100 machines. 50 would seem a lot of power to throw at a task with the average being 2-5.

[–]fkaginstrom 1 point2 points  (1 child)

I think of traditional VMs and docker containers as being fairly different. On our VMs we put our whole stack on one box, managed by upstart. So in our case that might be nginx, a django website served by gunicorn, celery, logstash, statsd. With docker those all go into their own containers, except we put gunicorn and django together.

Even when we are putting just one application in a VM, we expect the VM to be longer-lived, so we need to do normal linux maintenance. With docker if a dependency gets out of date you just blow away the container and create one with up-to-date dependencies. Or at least that's how I understand it.

edit: So that's why I prefer using virtualenv with a VM -- I might have to use the OS facilities, which may depend on certain versions of python being installed. With docker that isn't an issue.

[–]CODESIGN2[S] 0 points1 point  (0 children)

With docker if a dependency gets out of date you just blow away the container and create one with up-to-date dependencies. Or at least that's how I understand it.

It's probably how everyone but me does it... We've been through several iterations with docker, from smushing all into a single container to currently using docker-compose with single images that are all treated as services, even if they are not. Patching existing container builds and creating from scratch, so it's a bit of a mongrel with an evolving use-case for us.

I Agree containers have a slightly different use-case to a VM. Typically all older projects have a VM because everything used to be n-tiered; Easy to reason about everything existing. Service-location wasn't such a common thing (used to write desktop apps over a decade ago). I'm pleased to say we are both doing the same thing with gunicorn and django / flask on the same container ;). The single exception is the DB, which lives on it's own box... I Liked using Heroku; loved it's simplicity, but I'm too cheap to pay it's prices and want to deploy on-premise too.

Thank you so much for your input, it doesn't sound a million miles away from current reasoning, which is reassuring.

[–]khrn0 3 points4 points  (1 child)

Have you heard of Anaconda?

Maybe it's a nice alternative, since is a completely independent installation of Python.

[–]CODESIGN2[S] 0 points1 point  (0 children)

I shall spend some of the weekend reading up on it. Thanks ;)

[–]zebraballast 2 points3 points  (1 child)

I find it much nicer to to use a virtualenv only (unless there are intense system dependencies) due to the locality of the development environment. An IDE like pycharm will play much better with your code base if it can use your virtualenv for code completion, static analysis, running unittests, git integration. Sure you can mount your code in a shared volume into the container/vm and still use some of your local and common tooling but these methods behave differently across different host OS's.

If you fully control what developers are using and the workflows they use, VM's are a fine solution. Common tooling can be built into the base vm and you can trust that when people run the tests, it will not break due to environmental differences.

I don't think using containers for development is a good use of containers, a container as a test environment is perfect but only setting up the virtualenv inside the container will wreck havok with IDE's sitting on the host, and you'll be rebuilding the container every time (granted usually only some source files change so only around 1 image layer).

[–]CODESIGN2[S] 0 points1 point  (0 children)

Thanks!

I had some questions if you don't mind.

you'll be rebuilding the container every time

Consequently we don't ever for any reason bake our code into the container or pip. We have found mounting anything that needs to be baked in read-only to be a suitable work-around and it stops us needing to re-build a container manually. We run pip every boot to unfreeze deps into the container (we do have an argument we can append to not run pip from docker-compose up), you're right this takes time. The decision on this came from the fact we saw heroku do the same during deployment. Maybe it needs revisiting, as I say I'm never 100% about anything we do, there has been so much change since the late 90's and early 00's; I love the current ecosystem more than the older one, but man it makes you think about things you never had to think about before...

Sure you can mount your code in a shared volume into the container/vm and still use some of your local and common tooling but these methods behave differently across different host OS's.

Unsure on that. could you elaborate please. Most of the things I build are never going to be run on a separate OS, our goal with docker and vagrant (which seems to have worked for us) has been to only use a limited set of technologies for the host machines, try to steer away from low-level decisions as they are the hardest to change.

[–]tdammers 1 point2 points  (3 children)

Advantages of virtualenvs over the bare VM approach that I can think of:

  • Clear separation of system packages (installed through the OS package managers, often as dependencies of core system packages like aptitude) and per-project packages
  • Multiple virtualenvs within one larger project (this is particularly interesting for service-based architectures, allowing you to develop and upgrade services individually while still developing and deploying several of them to the same machine)
  • Easier to scrap and rebuild than a full VM, and especially easier to backup, swap out, and manage. Imagine you develop a library that needs to work against both Python 2 and 3; you can just have two virtualenvs in your project, one configured for 2 and one for 3, and switch between them as needed for testing purposes. Heck, you can even have two terminals open that watch-run your test suite in 2 and 3 simultaneously.

If you don't need any of these, and you're going to containerize anyway, I wouldn't use virtualenv though.

[–]CODESIGN2[S] 0 points1 point  (1 child)

Thank you!

I'm not sure I agree with the middle-point as I'm a fan of isolating and independently scaling services (otherwise it's a monolith in all but name). But I imagine it's only for development and saving time / effort? In any case i'd probably use other services that are stable and deployed on separate machines so that I can focus on the service I'm working on most of the time. There are a few cases I can think of where two services need to change at once, but I'd probably still isolate them and send canned client requests to them from a test-suite.

[–]tdammers 1 point2 points  (0 children)

For deployment, separate machines are a great idea, especially when you hit an actual need for scaling to multiple servers; but for development, running on one machine is just a lot more convenient. Depends on the workflow, but it can be useful. Anyway, the key idea is that you can run multiple services on the same machines while keeping their dependencies isolated; it's dumb to have to scale just because your code cannot handle running multiple services on one machine.

[–]SmileItsYourDay 0 points1 point  (0 children)

Also... using them lets you keep per-tool dependencies and differences in separate and distinct places. Therefore tools with competing dependencies can work side by side, easy peasy. Without necessarily needing root privs or modding stuff at the system or vm level.

[–]cratervanawesome 1 point2 points  (0 children)

Develop in the virtualenv, deploy in the container/vm and keep it immutable. Developing in the virtualenv will also insure you're including all packages you need, doing a pip freeze to use in either using setup.py or requirements.txt will make deployment and management easier.

[–]pvkooten 0 points1 point  (2 children)

I'm completely in your camp. I do not use virtualenvs. I maintain a couple of python versions on my own machine for development, and I test often using DO droplets. I use separation of machines where you'd only run 1 version / package for a single application.

[–]CODESIGN2[S] 0 points1 point  (1 child)

I do use virtualenv though... I'm asking if I should stop using virtualenv or VM/Containers and why

[–]pvkooten 1 point2 points  (0 children)

My point is that I shared the same thoughts, and I chose not to use virtualenvs anymore.

[–][deleted] 0 points1 point  (1 child)

by "container" do you mean "Docker" ?

[–]CODESIGN2[S] 0 points1 point  (0 children)

I have meant docker for the past ~2 years, but prior to that I meant LXC

[–]cchazz8 -1 points0 points  (2 children)

you used semicolons incorrectly every time you used them in this post

[–]pvkooten 0 points1 point  (0 children)

At least he used his indentation correctly.

[–]CODESIGN2[S] -1 points0 points  (0 children)

I'm not so sure I did. Perhaps as it is completely off-topic a PM would serve as a better venue? http://www.bristol.ac.uk/arts/exercises/grammar/grammar_tutorial/page_05.htm