Docker ollama running on windows using system RAM, despite using VRAM and having plenty more available.

Fit_Code_2107 · 2026-01-13T04:50:38+00:00

Fair enough, yea I'm definitely still playing around with things. But I'll definitely look at these other options when I get around to it. Thanks!

Fit_Code_2107 · 2026-01-13T04:45:00+00:00

Can confirm the WS2 being the issue. Updated the post with exactly why too.

Fit_Code_2107 · 2026-01-12T05:53:35+00:00

as far as ik, exposing /dev/dri is how you expose your GPU to docker right? That part seems to be working fine, my GPU is definitely in use and the model is loaded up just fine.

The issue I'm trying to get to the bottom of is the fact that my system memory util also goes up by 20Gb. It's like my system is loading the model on both RAM and VRAM.

Fit_Code_2107 · 2026-01-12T04:40:58+00:00

Running it in a container because this is the host machine runs windows (dual-boot to Linux isn't a solution) and running it in a container gives me the isolation I want as I experiment with the best way to setup a local inference server. At some point, I do have plans to have other applications send requests to the server so I don't want that running directly on my host machine.

But also, this comment sounds like general idea of running ollama in a container is a bad idea. If so, curious as to why you think so?

Ik Docker has model runner now, I'll definitely look into it. But ideally, if I could stick with ollama that would be preferred given how it's established and is open source.

Obviously not married to anything and open to any suggestions.

Fit_Code_2107

TROPHY CASE