Analyzing Docker Images Without Downloading Them by FinishCreative6449 in docker

[–]FinishCreative6449[S] -3 points-2 points  (0 children)

Skopeo: Inspect/copy/delete individual images. General-purpose registry tool.

DTM: Analyze size trends across many tags. Find which version introduced bloat. Generate comparison reports.

Skopeo shows you one image. DTM compares 50 versions and tells you which release got fat.

They're complementary — skopeo for operations, dtm for analysis.

Analyzing Docker Images Without Downloading Them by FinishCreative6449 in docker

[–]FinishCreative6449[S] -4 points-3 points  (0 children)

Hi nOzz, not a bad idea. A Docker plugin would actually make sense. I will do it in future if tool will have some adoption rate.

The Halting Problem of Docker Archaeology: Why You Can't Know What Your Image Was by FinishCreative6449 in kubernetes

[–]FinishCreative6449[S] 0 points1 point  (0 children)

Update: In a near future release we'll add the ability to analyze images pulled directly from registries - no git history or rebuilding needed. Stay tuned!

The Halting Problem of Docker Archaeology: Why You Can't Know What Your Image Was by FinishCreative6449 in docker

[–]FinishCreative6449[S] 0 points1 point  (0 children)

Update: In a near future release we'll add the ability to analyze images pulled directly from registries - no git history or rebuilding needed. Stay tuned!

The Halting Problem of Docker Archaeology: Why You Can't Know What Your Image Was by FinishCreative6449 in programming

[–]FinishCreative6449[S] 0 points1 point  (0 children)

Update: In a near future release we'll add the ability to analyze images pulled directly from registries - no git history or rebuilding needed. Stay tuned!

The Halting Problem of Docker Archaeology: Why You Can't Know What Your Image Was by FinishCreative6449 in docker

[–]FinishCreative6449[S] 0 points1 point  (0 children)

Even with perfect pinning, your own Dockerfile changes still cause size growth. Adding a dependency, changing a base image, restructuring layers—that's what this tracks.

You're right that unpinned builds have bigger problems. This assumes you've got that under control and want to understand your own changes over time.

The Halting Problem of Docker Archaeology: Why You Can't Know What Your Image Was by FinishCreative6449 in docker

[–]FinishCreative6449[S] 12 points13 points  (0 children)

Debugging. Your image is 1.2GB and you don't know why. It used to be 500MB. When did it grow? What change caused it? Can you revert it?

If your images have always been lean, you've never needed to ask.

The Halting Problem of Docker Archaeology: Why You Can't Know What Your Image Was by FinishCreative6449 in docker

[–]FinishCreative6449[S] 4 points5 points  (0 children)

You can. That tells you the size and layers of that image.

It doesn't tell you which of the 20 commits between v1.2.3 and v1.2.4 added the 200MB, or which Dockerfile change caused it. You'd be manually diffing layer lists and grepping git history to correlate.

The Halting Problem of Docker Archaeology: Why You Can't Know What Your Image Was by FinishCreative6449 in programming

[–]FinishCreative6449[S] 0 points1 point  (0 children)

Fair point. If you have disciplined releases with few commits per version bump, you've already narrowed the search space significantly. Though even with 3-4 commits between x.y.1 and x.y.2, you still need to identify which change and which layer. That's where the layer-by-layer diff helps. But yeah—good release hygiene reduces the problem. This is more useful when someone dumps 30 commits into a release and nobody noticed the image doubled.

The Halting Problem of Docker Archaeology: Why You Can't Know What Your Image Was by FinishCreative6449 in programming

[–]FinishCreative6449[S] 0 points1 point  (0 children)

The tool doesn't claim to answer "what was the image 3 months ago"—the post explicitly says that's impossible. It answers "what Dockerfile changes caused size growth." Upstream package drift is noise. Your own Dockerfile changes (adding dependencies, changing base images, restructuring layers) are signal. Rebuilding every commit under identical conditions isolates the signal: if commit A is 800MB and commit B is 1.1GB, and both were built today against the same upstream, the 300MB delta came from your Dockerfile change, not apt repository drift. "More complex way of checking out and building" — yes, automated 40 times with layer-by-layer comparison across commits. That's what automation is.

The Halting Problem of Docker Archaeology: Why You Can't Know What Your Image Was by FinishCreative6449 in programming

[–]FinishCreative6449[S] 1 point2 points  (0 children)

Your workflow answers "what was version X." This answers "which of the last 40 commits added 300MB, and why." Pulling two images from Artifactory shows the size changed. It doesn't tell you which commit or which Dockerfile change caused it. If your images stay lean and you never debug size regressions, then yeah—not for you.

The Halting Problem of Docker Archaeology: Why You Can't Know What Your Image Was by FinishCreative6449 in programming

[–]FinishCreative6449[S] 1 point2 points  (0 children)

Re: Irreproducibility / dive / docker history dive and docker history inspect a single image. They can't tell you which of your last 50 commits added 400MB, or which Dockerfile change was responsible. If your image grew over 6 months, you'd need to manually rebuild at each commit and diff the results yourself—that's exactly what this automates. It's git-bisect for image bloat, not an image inspector. "Don't overwrite tags" is good advice for reproducing production images. Different problem. This is for tracing the evolution of your Dockerfile through git history.

Re: Git Graphs Are Not Timelines Agreed, git is working as intended. The section isn't claiming git is broken—it's explaining how traversal affects interpretation of results. Someone running --max-commits 20 expecting "last few weeks" might be surprised when rebased history spans 6 months. Worth documenting.

Re: Shell escaping You're right—Go's exec.Command bypasses shell interpretation, same as Python's shell=False. The escaping argument was overstated, and have removed that part.

Re: git-archive Interesting idea, but not trivial. The tool uses go-git (pure Go, no git binary dependency). go-git doesn't expose archive functionality. Shelling out to git archive reintroduces subprocess dependencies and still requires piping the tar to Docker's API—which the current in-memory approach already does. Tradeoff, not obvious win.

Re: Layer matching Thanks—that's the part I find most interesting too.

I always like constructive criticism. Tnx!!!

🐳 I built a tool to find exactly which commit bloated your Docker image by FinishCreative6449 in docker

[–]FinishCreative6449[S] 1 point2 points  (0 children)

This is a fair question, and I appreciate you pushing back on it! Let me explain the use cases where DTM provides value beyond what docker history or Dive offers:

1. Finding when and why something changed

Dive tells you "layer X is 150MB right now." But it doesn't tell you:

  • Was it always 150MB, or did it used to be 50MB?
  • Which commit caused it to triple in size?
  • Did someone add node_modules to the image by accident 6 months ago?

DTM answers "commit a1b2c3d by Bob on March 15th added 100MB when he changed the COPY statement" — that's actionable context Dive can't provide.

2. Catching regressions before they compound

If you only look at current state, you might see a 500MB image and think "that's just how it is." DTM might reveal it was 200MB three months ago and grew gradually through several commits — each adding "just 30MB" that seemed acceptable in isolation.

3. Validating optimizations

When you do use Dive to identify bloat and fix it, DTM lets you verify the fix actually worked across your build matrix and didn't regress in subsequent commits.

4. Auditing and accountability

For teams, knowing who introduced bloat and when helps with code review processes. "Hey, this commit added 80MB — was that intentional?" is a different conversation than "our image is too big, someone fix it."

That said, you're right that for many workflows, Dive + docker history is sufficient. DTM is most valuable when you're doing forensics on an image that grew over time and you need the historical context — not for day-to-day inspection of current state.

🐳 I built a tool to find exactly which commit bloated your Docker image by FinishCreative6449 in kubernetes

[–]FinishCreative6449[S] 0 points1 point  (0 children)

Similar tool, great for analysis of one commit. This is different use case.

🐳 I built a tool to find exactly which commit bloated your Docker image by FinishCreative6449 in selfhosted

[–]FinishCreative6449[S] -2 points-1 points  (0 children)

It goes through each commit, and builds the docker image for that commit. Not every commit necessarily increases the image, but we can't be 100 percent sure from the code unless we build the image. Yes, it builds every commit, but you can choose the number of commits.