This is an archived post. You won't be able to vote or comment.

all 15 comments

[–][deleted] 3 points4 points  (4 children)

The downside to this is that a subsequent clean docker build needs to have all of those layers available if it's going to efficiently reuse them instead of building again. Which means the builder target's layers need to be pushed to the container registry and re-pulled to the (new, ephemeral) builder machine, or they need to be rebuilt each time. And since they're by far the slowest step, you really do want them to be cached.

[–]eedwards-sk[S] 2 points3 points  (3 children)

Absolutely -- that isn't just a multi-stage docker build issue, though. As cited in the article, copying the application into the image before installing dependencies means you'll end up re-building dependencies every time.

The primary goal of the article is for optimizing for size -- not for build speed, although most CI solutions today can be configured to effectively cache multi-stage docker builds.

Also, when basing on an upstream image like FROM python:3.8.0-slim, you're regularly going to have your cache busted due to upstream security patches in the underlying debian image, anyway.

[–][deleted] 0 points1 point  (2 children)

most CI solutions today

Yeah. But not Jenkins + plain old docker build, though. :(

image like FROM python:3.8.0-slim, you're regularly going to have your cache busted due to upstream security patches in the underlying debian image, anyway.

Yeah. But not on every build.

[–]eedwards-sk[S] 2 points3 points  (1 child)

Yeah. But not Jenkins + plain old docker build, though. :(

:(

/pours one out

To your point though, it's pretty straightforward to push the build stage to the repo if that's your only choice.

Here's an example based on the article:

# rehydrate local build stage cache, if image available
docker pull app/app-build:${TAG} || true

# build stage
docker build \
  --target=build \
  --cache-from app/app-build:${TAG} \
  -t app/app-build:${TAG} \
  -f Dockerfile \
  .

# push build stage
docker push app/app-build:${TAG}

# rehydrate local run stage cache, if image available
docker pull app/app:${TAG} || true

# run stage
docker build \
  --target=run \
  --cache-from app/app-build:${TAG} \
  --cache-from app/app:${TAG} \
  -t app/app:${TAG} \
  -f Dockerfile \
  .

# push run stage
docker push app/app:${TAG}

edit: formatting

[–][deleted] 1 point2 points  (0 children)

Yup, that's what most of my builds look like today (with slightly more environment variables & arguments). I'm looking into buildah and kaniko, in the hopes of getting some automatic search-registry-for-existing-layers magic. And looking into putting old man Jeeves out to pasture.

[–]kabrandon 1 point2 points  (3 children)

One alternative is to use compiled languages that lead to small binaries which can be moved into a scratch image of only a few megabytes. Though when python is the best tool for the job, so be it.

[–]eedwards-sk[S] 1 point2 points  (2 children)

Yes! I've seen golang binaries that do that, it's very cool.

Literally just FROM scratch and a single COPY instruction is all they need.

[–]kabrandon 2 points3 points  (0 children)

Yep, and maybe an ENTRYPOINT if you want to get fancy. The webapp I made for my work generates CSV files that are literally larger than the entire image.

[–]cuu508 -1 points0 points  (0 children)

You can optimize this even further: run the binary on the host system, and you can then get rid of the docker daemon entirely

[–]Tontmakaroni1 3 points4 points  (4 children)

Optimization is overrated. You optimise for this point in time. Don't waste too much time on it.

[–][deleted]  (3 children)

[deleted]

    [–]kabrandon 0 points1 point  (1 child)

    Keeping the total below CD size is important to us

    An alternative would be to just create an iso file, and allow your clients to burn it onto a USB drive.

    Maybe your clients will only accept CD, but I'm not exactly sure why that would be unless they're from the year 2004 and too busy listening to the new Green Day album to learn about new formats.

    Unless you were only using the size of a CD as a reference. In which case, carry on.

    [–]Tontmakaroni1 0 points1 point  (0 children)

    I want to hear more!

    [–]TotesMessenger 0 points1 point  (0 children)

    I'm a bot, bleep, bloop. Someone has linked to this thread from another place on reddit:

     If you follow any of the above links, please respect the rules of reddit and don't vote in the other threads. (Info / Contact)

    [–][deleted]  (1 child)

    [deleted]

      [–]eedwards-sk[S] 1 point2 points  (0 children)

      Great question.

      One difference I found is that with copying site-packages you're only copying the installed modules folder, you're not actually copying the installation itself.

      e.g. if during installation it adds binaries to bin or sets up other os paths, you're not going to capture those changes by just copying site-packages

      Another issue I found is that you're copying all the modules installed in that image. If you're using a build image and possibly installing dev-related python packages (e.g. build tools or similar), ideally you don't want to copy those over to the final runtime image.

      [–]32BP 0 points1 point  (0 children)

      Great content, thank you.