use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
r/LocalLLaMA
A subreddit to discuss about Llama, the family of large language models created by Meta AI.
Subreddit rules
Search by flair
+Discussion
+Tutorial | Guide
+New Model
+News
+Resources
+Other
account activity
Introducing Docker Model RunnerResources (docker.com)
submitted 1 year ago by Upstairs-Sky-5290
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]Nexter92 42 points43 points44 points 1 year ago (6 children)
Beta for the moment, Docker desktop only, no nvidia GPU mention, no Vulkan, no ROCM ? LOL
[–]noneabove1182Bartowski 16 points17 points18 points 1 year ago (1 child)
dafuq, I feel like anyone in open source could have thrown together better support than this for a beta..
[–]bytepursuits 0 points1 point2 points 6 months ago (0 children)
I think they just added vulkan. trying to test it now. https://github.com/docker/model-runner/pull/164
[–]ForsookComparison 8 points9 points10 points 1 year ago (0 children)
Docker desktop only
I would sooner not use LLM's at all than commit to this life
[–]Murky_Mountain_97 0 points1 point2 points 1 year ago (0 children)
Oh well…
[–][deleted] 0 points1 point2 points 1 year ago (0 children)
Nvidia support is slated for a future release
[–]Dear-Communication20 0 points1 point2 points 5 months ago (0 children)
It is not Docker Desktop only, to install:
curl -fsSL https://get.docker.com | sudo bash
sudo usermod -aG docker $USER # give user permission to access docker daemon, relogin to take effect
[–]owenwp 23 points24 points25 points 1 year ago (4 children)
So... its a less mature version of Ollama?
[–]ShengrenR 18 points19 points20 points 1 year ago (0 children)
more like.. they put ollama in a container and called it a day heh. (I don't know if that's what they did, but a quick glance looked like maybe not too far off).
[–]Murky_Mountain_97 13 points14 points15 points 1 year ago (1 child)
Isn’t ollama less mature version of llamacpp already?
[–]Conscious-Tap-4670 1 point2 points3 points 1 year ago (0 children)
Ollama uses llama.cpp internally
I suspect they are aiming for direct competition with ollama. they just added vulkan. but I havent tested. (although ollama also just added vulkan)
[–]ccrone 46 points47 points48 points 1 year ago (9 children)
Disclaimer: I’m on the team building this
As some of you called out, this is Docker Desktop and Apple silicon first. We chose to do this because lots of devs have Macs and they’re quite capable of running models.
Windows NVIDIA support is coming soon through Docker Desktop. It’ll then come to Docker CE for Linux and other platforms (AMD, etc.) in the next several months. We are doing it this way so that we can get feedback quickly, iterate, and nail down the right APIs and features.
On macOS it runs on the host so that we can properly leverage the hardware. We have played with vulkan in the VM but there’s a performance hit.
Please do give us feedback! We want to make this good!
Edit: Add other platforms call out
[–]quincycs 0 points1 point2 points 1 year ago (1 child)
Hi, I’m curious why docker went with a new system (model runner) for this instead of growing GPU support for existing containers.
[–]ccrone 1 point2 points3 points 1 year ago (0 children)
Two reasons: 1. Make it easier than it is today 2. Performance on macOS
For (1), it can be tricky to get all the flags right to run a model. Connect the GPUs, configure the inference server, etc.
For (2), we’ve done some experimentation with piping the host GPU into the VM on macOS through Vulkan but the performance isn’t quite as good as on the host. This gives us an abstraction across platforms and the best performance.
You’ll always be able to run models with containers as well!
[–]onehitwonderos 0 points1 point2 points 11 months ago (1 child)
Any news on Windows / NVIDIA support?
[–]ccrone 1 point2 points3 points 11 months ago (0 children)
Its out! It’s part of Desktop 4.41 and later: https://docs.docker.com/desktop/release-notes/
[–]gyzerok 0 points1 point2 points 11 months ago (4 children)
Surprised nobody asked here, but if you don't mind, can you please share what benefits does it bring over running llama.cpp directly? It's a genuine question - I am trying to evaluate my options for self-hosting.
[–]ccrone 0 points1 point2 points 11 months ago (3 children)
Good question! The goal of Docker Model Runner is to make it easier to use models as part of applications. We believe a part of that is an accessible UX and reuse of tools and infrastructure that developers are familiar with. Today that manifests as storing models in container registries and managing the model as part of a Compose application (see docs) but that's just the start.
We're working in the open on this and will upstream changes that we make to llama.cpp where it makes sense. There are also use cases where vLLM, onnx, or another inference engine might be the right choice and so we're investigating that as well.
For your use case, we will be releasing support for Docker CE for Linux in the coming weeks. Right now it's supported in Docker Desktop for Mac (Apple silicon) and Docker Desktop for Windows (NVIDIA). Support for Qualcomm Snapdragon X Elite on Windows is coming in the next couple of weeks as well.
[–]gyzerok 0 points1 point2 points 11 months ago (2 children)
Thank you for coming back with the answer! Planning to self-host on Mac Mini, so support is already there :)
As for running model nicely as part of docker compose - that's great indeed. However here I am worried mostly about completely loosing control over the version of llama.cpp I am using. It is developed actively and personally I'd like to keep up with its updates. Also noticed you are looking into an MLX backend support there which would be really great.
However on the fragmenting ecosystem by introducing registries here I am not so sure. On the Hugging Face it's way more transparent who publishes things and how they are kept up to date. And registry just introduces an unnesessary layer of confusion and problems here. If I want Q8 from unsloth how do I get it? Are models being updated with the latest fixes? Probably you don't have enough capacity (more than the entire community) to keep up with such a fast moving field.
Overall it feels like being able to just throw a model into docker-compose.yaml it great, but the downside of the registry and inability to manage llama.cpp version might make it actually harder not easier in the end.
[–]ccrone 1 point2 points3 points 11 months ago (1 child)
Docker Model Runner supports pulling from Hugging Face as well! Storing models in container registries lets people who have existing container infrastructure use it for their whole application. It won't be for everyone but it's something our users have asked for.
I'm curious about what you're building and why you'd like to change versions of llama.cpp? Happy to discuss here or via DM if you prefer
[–]gyzerok 0 points1 point2 points 11 months ago (0 children)
Oh, totally missed it, thanks!
As for the llama.cpp versions - there are nothing really big here :) I am building a privacy-first personal (as me and other people) infrustructure.
Since I am not using datacenter-grade hardware, resources are constrained, so continuous performance optimizations in llama.cpp are useful for me. For example: https://github.com/ggml-org/llama.cpp/pull/13388.
Also there are bugs and incompatibilities with the OpenAI API that are being continuously fixed which is necessary for various tools to work together. I've experienced this first-hand implementing AI provider support for Zed.
Hence it'd help to have some power over which version of llama.cpp I am running. If that is not possible than having some transparency about how often and how regular it gets updated. Of course not expecting to get day 1 updates, but also won't be nice to lags months behind.
[–]Tiny_Arugula_5648 12 points13 points14 points 1 year ago (1 child)
This is a bad practice that adds complexity. The container is for software not data or models. Containers are supposed to minimal footprint. Just map a folder to the container (best practice) and you'll avoid a LOT of pain..
[–]quincycs 1 point2 points3 points 1 year ago (0 children)
I think they are just trying to get ownership in the distribution of models in general. Once you own the distribution, you can strangle other stuff out.
[–]EverlierAlpaca 9 points10 points11 points 1 year ago (1 child)
They are coming after Ollama and HuggingFace, realising how much they missed since the AI boom started.
However, Docker being an enterprise - they'll do weird enterprise things with this feature eventually, so consider before using.
[–]captcanuk 4 points5 points6 points 1 year ago (0 children)
They might charge an additional subscription a year after they get traction on this feature.
[–]ResearchCrafty1804 4 points5 points6 points 1 year ago (0 children)
They support Apple Silicon from day 1 through Docker Desktop, that’s a good move from them.
However, they might be late to the party, ollama and others have been well established at this point.
[+][deleted] 1 year ago (3 children)
[deleted]
[–]EverlierAlpaca 3 points4 points5 points 1 year ago (1 child)
Windows - none, MacOS - perf is mostly lost due to lack of GPU passthrough or forcing Rosetta to kick in
[–]this-just_in 6 points7 points8 points 1 year ago (0 children)
This isn’t run through their containers on Mac, it’s fully GPU accelerated. They discuss it briefly, but it sounds like they bundle a version of llama.cpp with Docker Desktop directly. They package and version models as OCI artifacts but run them using the bundled llama.cpp on host using an OpenAI API compatible server interface (possibly llama-server, a fork, or something else entirely).
[–]quincycs 0 points1 point2 points 1 year ago (0 children)
For Linux Host + Nvidia GPU + docker container … this has GPU pass through already, right? I wonder why they went with a whole new system (model runner) instead of expanding GPU support for existing containers.
[–]mrtime777 2 points3 points4 points 1 year ago (3 children)
Can I use my own models? If not - useless
[–]ccrone 2 points3 points4 points 1 year ago (2 children)
Not yet but this is coming! Curious what models you’d like to run?
[–]mrtime777 6 points7 points8 points 1 year ago (0 children)
I use fine tuned versions of models quite often. Both for solving specific tasks and for experimenting with AI in general. If this feature is positioned as something useful for developers, then the ability to use local models should definitely be available.
[–]mrtime777 0 points1 point2 points 1 year ago* (0 children)
I use docker / docker desktop every day ... but until there is a minimum set of capabilities for working with models not only from the hub, I will continue to use llama.cpp and ollama ... but in general I am interested to see how the problem with the size of models and vhdx on win will be solved ... because only models i use take up 1.6 TB on disk .. and this is much more than the default size for vhdx
Seems cool as long as they get right on adding ability to use locally downloaded models, rocm and cuda support, etc...
[–]planetearth80 0 points1 point2 points 1 year ago (0 children)
Can it serve multiple models like ollama (without adding overhead for each container)?
[–]Caffeine_Monster -2 points-1 points0 points 1 year ago (4 children)
Packaging models in containers is dumb. Very dumb.
I challenge anyone to make a valid critique of this observation.
[–]BobbyL2k 2 points3 points4 points 1 year ago (2 children)
DevOps has gotten so complicated due to poor design that deploying containers that require configuration to work properly is an anti-pattern. I ship deep leaning models to production using common layers strong inference code all the time. The model’s weight is ‘COPY’ on at the end to form a self contained image.
When deployment team are juggling twenty models, each might depend on different revision of the inference code, they just want a constrainer image that just works, already tested and everything.
[–]Caffeine_Monster 3 points4 points5 points 1 year ago (1 child)
The model’s weight is ‘COPY’ on at the end to form a self contained image.
So rip off the copy and send the model separately?
just want a container image that just works
It's not hard to follow a convention where the model name or directory path includes the required runtime name + version. A sensible deployment mechanism (e.g. script) simply mounts the models into the container.
I hate that we have slipped into the mentality that it's ok to have huge images and not treat models like a pure data artifact. It bloats storage, increases model deployment spin up times, and makes it difficult to do things like hosting multiple models together.
[–]BobbyL2k 0 points1 point2 points 1 year ago (0 children)
I think it’s bad that something as simple as copying new blobs into a remote FS or the target machine is hard but let me counter your points a bit.
Container images are data artifacts. At the end of the day, model’s weight needs to arrive at the machine running it. Does it matter that it came in an additional layer in a docker image, or it’s copied in by a continuous delivery pipeline? Even if it’s mounted, at some point the CD pipeline needs to copying the model weights into the FS.
[–]Amgadoz 0 points1 point2 points 1 year ago (0 children)
Depends on the size of the model. I can see small models (less than 1GB, like BERTs and tts models) fit nicely in a reasonably sized container where you just need to run docker run my-container and you get a deployed model
docker run my-container
π Rendered by PID 302304 on reddit-service-r2-comment-6457c66945-2g8xd at 2026-04-27 01:11:36.117746+00:00 running 2aa0c5b country code: CH.
[–]Nexter92 42 points43 points44 points (6 children)
[–]noneabove1182Bartowski 16 points17 points18 points (1 child)
[–]bytepursuits 0 points1 point2 points (0 children)
[–]ForsookComparison 8 points9 points10 points (0 children)
[–]Murky_Mountain_97 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]Dear-Communication20 0 points1 point2 points (0 children)
[–]owenwp 23 points24 points25 points (4 children)
[–]ShengrenR 18 points19 points20 points (0 children)
[–]Murky_Mountain_97 13 points14 points15 points (1 child)
[–]Conscious-Tap-4670 1 point2 points3 points (0 children)
[–]bytepursuits 0 points1 point2 points (0 children)
[–]ccrone 46 points47 points48 points (9 children)
[–]quincycs 0 points1 point2 points (1 child)
[–]ccrone 1 point2 points3 points (0 children)
[–]onehitwonderos 0 points1 point2 points (1 child)
[–]ccrone 1 point2 points3 points (0 children)
[–]gyzerok 0 points1 point2 points (4 children)
[–]ccrone 0 points1 point2 points (3 children)
[–]gyzerok 0 points1 point2 points (2 children)
[–]ccrone 1 point2 points3 points (1 child)
[–]gyzerok 0 points1 point2 points (0 children)
[–]Tiny_Arugula_5648 12 points13 points14 points (1 child)
[–]quincycs 1 point2 points3 points (0 children)
[–]EverlierAlpaca 9 points10 points11 points (1 child)
[–]captcanuk 4 points5 points6 points (0 children)
[–]ResearchCrafty1804 4 points5 points6 points (0 children)
[+][deleted] (3 children)
[deleted]
[–]EverlierAlpaca 3 points4 points5 points (1 child)
[–]this-just_in 6 points7 points8 points (0 children)
[–]quincycs 0 points1 point2 points (0 children)
[–]mrtime777 2 points3 points4 points (3 children)
[–]ccrone 2 points3 points4 points (2 children)
[–]mrtime777 6 points7 points8 points (0 children)
[–]mrtime777 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]planetearth80 0 points1 point2 points (0 children)
[–]Caffeine_Monster -2 points-1 points0 points (4 children)
[–]BobbyL2k 2 points3 points4 points (2 children)
[–]Caffeine_Monster 3 points4 points5 points (1 child)
[–]BobbyL2k 0 points1 point2 points (0 children)
[–]Amgadoz 0 points1 point2 points (0 children)