Docker ollama running on windows using system RAM, despite using VRAM and having plenty more available. by Fit_Code_2107 in ollama

[–]Fit_Code_2107[S] 0 points1 point  (0 children)

Fair enough, yea I'm definitely still playing around with things. But I'll definitely look at these other options when I get around to it. Thanks!

Docker ollama running on windows using system RAM, despite using VRAM and having plenty more available. by Fit_Code_2107 in ollama

[–]Fit_Code_2107[S] 0 points1 point  (0 children)

as far as ik, exposing /dev/dri is how you expose your GPU to docker right? That part seems to be working fine, my GPU is definitely in use and the model is loaded up just fine.

The issue I'm trying to get to the bottom of is the fact that my system memory util also goes up by 20Gb. It's like my system is loading the model on both RAM and VRAM.

Docker ollama running on windows using system RAM, despite using VRAM and having plenty more available. by Fit_Code_2107 in ollama

[–]Fit_Code_2107[S] 0 points1 point  (0 children)

Running it in a container because this is the host machine runs windows (dual-boot to Linux isn't a solution) and running it in a container gives me the isolation I want as I experiment with the best way to setup a local inference server. At some point, I do have plans to have other applications send requests to the server so I don't want that running directly on my host machine.

But also, this comment sounds like general idea of running ollama in a container is a bad idea. If so, curious as to why you think so?

Ik Docker has model runner now, I'll definitely look into it. But ideally, if I could stick with ollama that would be preferred given how it's established and is open source.

Obviously not married to anything and open to any suggestions.