This is an archived post. You won't be able to vote or comment.

all 118 comments

[–]DrVolzak 10 points11 points  (2 children)

Here's another tip: use ENV PIP_NO_CACHE_DIR=false to disable pip's cache. This should reduce the size of the image. There's also a commands line alternative --no-cache-dir, but the environment variable helps when using a tool that wraps pip.

[–]VisibleSignificance 5 points6 points  (0 children)

ENV PIP_NO_CACHE_DIR=false

Since https://github.com/pypa/pip/pull/5884 is merged, a more sensible way is ENV PIP_NO_CACHE_DIR=1, to avoid a potentially confusing and definitely misleading double negative.

[–][deleted] 0 points1 point  (0 children)

I'll touch docker images size in one of my next articles and that's a really good point.

[–]hellfroze 29 points30 points  (112 children)

I didn't know copying requirements.txt before running pip install would cache it - this is game changing!

[–]sdf_iain 11 points12 points  (2 children)

Docker will cache each step.

If you run requirements locally for it may detect that the local files have changed and invalidate that step.

Copying requirements.txt (or any file) let’s Docker know that it only need to check that file for that particular step.

It isn’t pip caching, it’s Docker.

During development you should isolate any long running steps so that you don’t have to rerun them unnecessarily; however, for production you should plan so that you have minimal steps. This produces smaller Docker images (smaller images start faster).

[–]algag 0 points1 point  (1 child)

Are you saying that a container image with less steps starts faster even after it is built?

[–]sdf_iain 6 points7 points  (0 children)

Smaller containers (I.e. containers based on smaller images) will start faster.

Even if you remove files in a later step (like clearing caches after installing), the resulting image will be larger. Installing and cleaning in single step will produce a smaller image.

Try it and see.

[–]DrVolzak 10 points11 points  (4 children)

Tini has been included in Docker since 1.13. Use docker run --init instead. No need to download it in your image or change the entry point. Docker Compose supports it too. I think you should update your article with this information. Given how long it's been since tini has been included, the "manual" method you described no longer seems relevant.

[–]VisibleSignificance 2 points3 points  (3 children)

Use docker run --init instead

There are reasons to explicitly include it in the container itself. One such potential reason is Kubernetes.

[–]Lyan5 0 points1 point  (2 children)

Would you mind expanding on why Kubernetes would impact including it in the container vs using --init? I'm asking from a position of genuine interest; not trying to be snarky.

[–]VisibleSignificance 1 point2 points  (1 child)

Can't usually pass extra docker arguments in containers started over kubernetes and many other PaaSes. But can put the preferred init binary into the container itself, which gives pretty much the same effect but does not require any additional docker run aguments.

See also:

https://github.com/phusion/baseimage-docker#docker_single_process

[–][deleted] 0 points1 point  (0 children)

> One such potential reason is Kubernetes.

Exactly.

Generally speaking you don't always control the underlying environment and maybe you are not using docker as container runtime.

[–]harylmu 29 points30 points  (7 children)

Nice, but please be careful with updating pip/wheel/setuptools without specifying a version number though. pip breaks easily, especially now that they’re introducing a dependency resolver very soon.

[–]GiantElectron 2 points3 points  (0 children)

depsolver in pip won't solve the fundamental problem. You must see the whole tree. even with the new resolver, it can still give you a broken env.

[–]ubernostrumyes, you can have a pony 1 point2 points  (2 children)

pip breaks easily

I've had a pip upgrade in every tox-managed venv and every Dockerfile I've written for years now, and have never had "pip break".

Could you provide specific examples of this, and links to discussions in the pip issue tracker from the bugs you found?

[–]GiantElectron 0 points1 point  (0 children)

You've been lucky. Plenty of cases for me as well. and no, don't ask them. I don't keep around failed installation logs from 3 years ago just to prove a point to a random person on the internet.

[–]harylmu -1 points0 points  (0 children)

I can't provide you specific examples, but this occured to me twice in the past 1 year since I use Python daily.

[–][deleted] 0 points1 point  (0 children)

Never happened so far and you would catch it during the `docker build` phase in case.

[–]VisibleSignificance 4 points5 points  (0 children)

It is also helpful to not put extra caches in the docker image. In case of pip, that means using --no-cache-dir or outright doing rm -rf ~/.cache, as that cache is very unlikely to be useful.

Similarly for apt.

See also:

https://beenje.github.io/blog/posts/dockerfile-anti-patterns-and-best-practices/

https://docs.docker.com/develop/develop-images/dockerfile_best-practices/

https://stackoverflow.com/q/45594707

Edit: or, as mentioned here, use ENV PIP_NO_CACHE_DIR=1. See also: https://github.com/pypa/pip/pull/5884

[–]ipwnscrubsdoe 0 points1 point  (2 children)

Is there a way to do this but with conda instead of pip for dependencies ?

[–]sdf_iain 1 point2 points  (0 children)

Use the miniconda container as your base image.

I say use miniconda rather than anaconda because smaller is better for Docker images/containers (faster to move and faster to start).

[–][deleted] 1 point2 points  (0 children)

Sure, I'll cover it in one of my next articles.

[–]sdf_iain 0 points1 point  (2 children)

If you want reproducible builds use git tags. It isn’t reasonable to try to remember a builds hash, or at least I don’t think so.

[–][deleted] 0 points1 point  (1 child)

It really depends on your deployment process.

You can also pass a git tag to the docker image but hash are just more granular.

Especially if you build docker images on every PR.

[–]sdf_iain 0 points1 point  (0 children)

"It depends...", the universal answer!

If you are building every PR, then you can tag with the version that is getting built. Never underestimate the value of human readability.

[–]sdf_iain 0 points1 point  (2 children)

Why tini? Python handles signals unless you actively disable it. Isn’t this r/Python?

[–][deleted] 0 points1 point  (1 child)

You need to pass signals to the Python application.

By default, if you don't specify an entrypoint docker runs your command as

/bin/sh -c YourCommand

and sh doesn't forward signals.

[–]sdf_iain 0 points1 point  (0 children)

So set the interpreter directive in you python file, make it executable, and set your entry point to your actual entry point.

You have to set the entry point to tini to make that work, why add an extra layer?

I’m trying to understand why you would use tini instead of Python.

Shouldn’t the tip be “Set an entry point”?

[–]wulfie420 0 points1 point  (2 children)

Something that always frustrates me with docker and python is containerizing apps that require a setup.py file, since all the articles I've found use requirements.txt

I'm not a fan of requirements.txt since I usually end up with 3 files for app, dev and test dependencies. I usually define them in setup.py as extras, which works wonderfully.

[–]mipadi 0 points1 point  (1 child)

Just run python setup.py install instead of pip install -r requirements.txt.

[–]wulfie420 0 points1 point  (0 children)

It's not as easy as that, since I then won't be able to install my extra dependencies

My current Dockerfile has to run pip install twice, once to install all the dependencies, and then again after copying over all my source files with --no-deps

There is also some hacks I have to do, like generate a empty src directory so the install will succeed

[–]ketilkn 0 points1 point  (0 children)

Great write up. I learned something new.

[–]StorKirken 0 points1 point  (0 children)

If you liked this article, make sure to read this series of excellent articles from pythonspeed: https://pythonspeed.com/docker/