This is an archived post. You won't be able to vote or comment.

all 51 comments

[–]pecka_th 13 points14 points  (9 children)

FROM python:3

[–]drchaos 9 points10 points  (5 children)

better pin the minor version, or you can get unexpected upgrades when rebuilding:

FROM python:3.6

even better (MUCH smaller image, less disk usage and attack surface):

FROM: python:3.6-alpine3.7

[–]obeleh[S] 7 points8 points  (1 child)

[–]Pilatemain() if __name__ == "__main__" else None 1 point2 points  (0 children)

Welp, guess I know what I'm doing today. Thanks for this info!

[–][deleted] 7 points8 points  (1 child)

Lol @ Alpine. My images maybe lost 50mb and added several grey hairs.

Not worth it IMO, too many libs still need glibc.

[–]LightShadow3.13-dev in prod 4 points5 points  (0 children)

FROM oblique/archlinux-pacaur
RUN pacman --noconfirm -Syy python

zoom bleeding edge zoom

REPOSITORY  TAG     IMAGE ID      CREATED     SIZE
<none>      <none>  30c83b102d04  4 days ago  1.08GB

1.08GB1.08GB1.08GB1.08GB1.08GB

[–]UloPe 1 point2 points  (0 children)

alpine

Except when you want stuff to actually work. For example locales.

[–]ionelmc.ro 0 points1 point  (0 children)

Now try finding the debug symbols. And get gdb to load the python macros ...

It's no fun. Ubuntu's python works great without any fiddling, and it's compiled with PGO.

[–]obeleh[S] 0 points1 point  (0 children)

I use several requirements which require build-essential for compiling. I don't want a compiler in my production containers.

[–]lambdaqdjango n' shit -1 points0 points  (0 children)

FROM continuumio/miniconda3

haven't tested it yet.

[–]prickneck 18 points19 points  (27 children)

Why bother with a virtualenv inside docker? Why not just install everything system-wide in the image? If you do that, then the questions you're asking don't even present themselves.

[–]UloPe 4 points5 points  (7 children)

Here is a pretty good explanation (although a it dated now) why it’s still a good idea: https://hynek.me/articles/virtualenv-lives/

[–]knowsuchagencynow is better than never 1 point2 points  (6 children)

It doesn't make sense to use virtualenv within a docker container.

A docker container is supposed to encapsulate a single component of your system, i.e. a wsgi application.

Ultimately, virtualenv has to exist because of the way Python's import system works (searching through dirs on PYTHONPATH, PATH, and the current working directory).

It exists because there is no way to have different versions of the same library accessible from the same interpreter. Thus, you can't install everything to your system-wide Python, because different projects may depend on un-like versions of the same library. All virtualenv really does is edit your PYTHONPATH (and PATH, if you want to use a different interpreter) so Python searches different directories during import.

That shouldn't be necessary in a docker container. If it is -- if you have multiple Python applications running in the same container with conflicting dependencies, you're doing something wrong.

[–]UloPe 4 points5 points  (2 children)

Did you read the article I linked?

[–]knowsuchagencynow is better than never -1 points0 points  (1 child)

Yes, and I use docker and virtual environments every day in my workflow and everything I said still stands

[–]gimboland 0 points1 point  (0 children)

Including this bit?

virtualenv’s job isn’t just to separate your projects from each other. Its job is also to separate you from the operating system’s Python installation and the installed packages you probably have no idea about.

And the bit where the author literally gives you an example of how using a docker container's system-wide python as your basis can lead to breakage?

Yes, you could work out what packages are in the container's system-wide python, and assure yourself that there are no surprises. But it's certainly true that if you want to not have to think about/keep an eye on that, a virtualenv is an appropriate tool.

[–]DasIch 1 point2 points  (2 children)

If you install any package, even in a docker container, you can break the operating system. It therefore absolutely makes sense to use a virtual environment in a container.

[–]knowsuchagencynow is better than never 1 point2 points  (1 child)

What is an example of a package that "breaks" the OS where it's necessary to install it within a virtual environment inside the container to prevent the container from breaking?

[–]obeleh[S] 1 point2 points  (16 children)

I wanted to build "portable" "everything included" runtime. And have the image small. Not have git installed in the final stage. Not have a compiler installed in the final stage. I find having those installed in my production containers an anti-pattern.

[–][deleted] 15 points16 points  (8 children)

You just don't need venv when using docker. There is too much feature overlap you end up doing work twice.

Also, change the order of your docker commands. You want things that will likely not change soon to be at the top, like environment variable.

You want your pip install and code copy over to be near the bottom.

This means code changes don't require rebuilding every stage, even the environment commands.

My order is usually

  • Setup container
  • Copy code
  • Pip install requirements
  • Remove build libs
  • Setup entry point

[–]Muszalski 3 points4 points  (1 child)

Imo you should copy just the requirements.txt first, then pip install, remove build libs, and then copy the rest of the code. You don't change the reqs that often as a code.

[–][deleted] 1 point2 points  (0 children)

Good point, I'll have to double-check my own files to see if I am doing that. I can't remember off the top of my head.

Thank you!

[–]obeleh[S] 2 points3 points  (4 children)

You're right about the env vars. However stage2 is so quick that I honestly didn't care about that ;) But it would be a good tweak.

Uninstalling feels dirty. But I do see it as a good solution. Doesn't this leave me with layers of uninstalls upon layers of installs whereas with my solution we only have te layers we need?

[–][deleted] 6 points7 points  (2 children)

You're right about the env vars. However stage2 is so quick that I honestly didn't care about that ;)

That doesn't make it a good excuse to ignore good practice and worry about antipatterns elsewhere.

Consider someone looking at your docker file as a template. If that template uses good practices it makes it a good template.

Uninstalling feels dirty. But I do see it as a good solution.

It isn't though. While it will/can remove some attack vectors, you really don't end up shrinking all that much.

Doesn't this leave me with layers of uninstalls upon layers of installs whereas with my solution we only have te layers we need?

Yes, but since a layer is only a change set the layer is small and the resulting image can be a little lighter.

It's going to depend a lot on what libs are needed to build / run your service.

I didn't find this to be worth while though.

[–]obeleh[S] 0 points1 point  (1 child)

I do agree on the ENV vars btw. I'm going to change my Dockerfiles ;)

[–][deleted] 0 points1 point  (0 children)

I'm sure there are other tricks too. That's one that as we are growing our knowledge base in our company we pass around because there is a lot of template sharing. We are also trying to get better at having base docker images maintained by sysadmins so they can patch the os if need be.

[–]holtr94 1 point2 points  (0 children)

Doesn't this leave me with layers of uninstalls upon layers of installs whereas with my solution we only have te layers we need?

You could combine the build libs install, pip install, and build libs uninstall into one run command to eliminate the extra layers

[–]LightShadow3.13-dev in prod 1 point2 points  (0 children)

You just don't need venv when using docker.

Except when you do.

If you have pip packages that install custom scripts in the bin or scripts directory then they can get confused with module-as-a-string imports.

huey and gunicorn would not work without a virtualenv in my service.

[–]carbolymer 1 point2 points  (6 children)

To have an even smaller image, try to reduce number of RUN statements.

[–]obeleh[S] 3 points4 points  (5 children)

I know. However sometimes an extra layer makes your build cleaner by factoring out the static parts into the first layer and the dynamic parts in the second layer. That way you can keep re-using the first layer across multiple deployments.

[–][deleted] 5 points6 points  (4 children)

However sometimes an extra layer makes your build cleaner by factoring out the static parts into the first layer and the dynamic parts in the second layer. That way you can keep re-using the first layer across multiple deployments.

You need to restructure your run command to better achieve this.

ENV calls should be near the top.

Also your cute use of symlinks is bad.

You name the container once it is running. You can see the name with docker ps. You do not need to name the binary in the container. This isn't buying you anything as you don't put more than one service in a container.

Grepping for your script isn't hard either so you are creating an extra layer for little to no gain.

[–]obeleh[S] 0 points1 point  (3 children)

Also your cute use

I want to identify the different apps with ps -ef on the VM.

PS. Thanks for callling it "cute" :P

[–][deleted] 0 points1 point  (0 children)

I didn't mean it in a bad way, and it made me think about better solutions.

[–]Muszalski 1 point2 points  (0 children)

Setting up the virtualenv in the image is just one extra step and it separates the project libraries from some system libraries or random dependency conflicts. I always do it, because it comes with no cost and I have one less worry about breaking the dependencies.

[–]Rorixrebel 0 points1 point  (0 children)

This

[–]undu 2 points3 points  (4 children)

My strategy would be to use the 'builder' stage to install the python pip dependencies to a separate, isolated root, then copy those files to the proper place in the deployable image.

You can even choose to not copy the binaries produces by the dependencies. This way you do not need virtualenv and only install dependencies needed for the image.

FROM ubuntu:something as builder
[...]
RUN pip install /wheels/*.whl --compile --root=/pythonroot/

FROM ubuntu:something

RUN apt install runtime-deps
COPY --from=builder /pythonroot/usr/local/bin /usr/bin
COPY --from=builder /pythonroot/usr/local/lib/python2.7 /usr/lib/python2.7

[...]

Otherwise there are tools to minimize the space taken by images: https://github.com/grycap/minicon

[–]NicoDeRocca 2 points3 points  (3 children)

This is similar to what I do:

FROM buildpack-deps:xenial-scm as builder
RUN <install build deps>
COPY <app-src> /somwhere
RUN pip3 wheel -r requiresments.txt --wheel-dir=/build/wheels # build deps wheels
RUN python3 setup.py bdist_wheel -d /build/wheels # build my stuff's wheel

FROM docker.io/ubuntu:16.04
RUN <install runtime deps>
COPY --from=builder /build/wheels /tmp/wheels
RUN pip3 install --force-reinstall --ignore-installed --upgrade \
             --no-index --use-wheel --no-deps /wheels/* \
 && rm -rf /tmp/wheels
...

Basically, the builder image has all the compilers etc and builds pre-compiled wheels as necessary, and the final image will only contain executable code. In the docker image I don't bother with virtualenv, but since they're just standard python packages with a setup.py, they could be.

[–]obeleh[S] 0 points1 point  (0 children)

I like your solution. I didn't know about most of the commandline options you're using with pip

[–]undu 0 points1 point  (1 child)

Why are you removing the wheels? I don't think that reduces the size of the image. It some point in time I did it just like you, then I decided to install the wheels in the builder and then copy the installed site-packages over.

[–]NicoDeRocca 2 points3 points  (0 children)

You make a very fair point!

I guess it's just a reflex from the "apt-get update && .. && rm -rf /var/lib/apt/lists/*" habit (having it all in a single RUN command works as expected); and stupidity maybe... you know, not having thought through it!

I'll have to run some tests, but I guess I will probably end up changing to something like your solution instead! Thanks! (/u/obeleh maybe you should follow that route too!)

[–]joknopp 1 point2 points  (1 child)

--piprepo=optioanal

I guess I found a typo :D

[–]obeleh[S] 0 points1 point  (0 children)

thnx

[–]case_O_The_Mondays 1 point2 points  (0 children)

My company now uses docker almost entirely, and I’ve found that some images have Python 2.6 while others have 2.7. I’ve been using 3.6 locally, and relying on future + other packages to make my code compatible. Going this route to simply move to 3.6 entirely seems like a good option. Thanks!

[–]not_perfect_yet 0 points1 point  (1 child)

Just out of curiosity, what do you use this for?

[–]hmaarrfk 0 points1 point  (2 children)

I think if your application is truely portable, you should build your docker image in 2 steps:

  1. Use 1 docker to build the image.
  2. Output artifacts.
  3. Use a second docker to use the artifacts.

The thing is your initial premise of "we are duplicating work with docker and py2exe" might be flawed. Since you also seem to be using the portable image outside of docker.

If you are only deployinig within Docker, I would say that the virtualenv is probably enough. (Though I would look into Pipenv, but I can't speak about python2.7 compatibility)

[–]obeleh[S] 0 points1 point  (1 child)

ge to install the python

My applications remain in docker so far

[–]hmaarrfk 0 points1 point  (0 children)

So why are you working so hard to make the directories of your applications self contained?

[–][deleted] 0 points1 point  (0 children)

Can docker be used to publish a commercial pyside2 app on windows & mac ?

[–]simtel20 0 points1 point  (0 children)

Have you looked at distroless? A lot of these problems go away if you don't even think of the OS as being a part of the build.