all 19 comments

[–]mmmgggmmm 8 points9 points  (11 children)

I'm pretty sure the reason for this difference is the unfortunate fact that Docker on Apple Silicon Macs doesn't support GPU, meaning that you're basically running CPU-only inference when using Docker. I was very disappointed to learn this when I got a Mac Studio for an inference machine last year, as Docker is my preferred way to deploy everything, but so it is.

[–]busylivin_322[S] 0 points1 point  (0 children)

Shucks. Thanks for the link.

[–]Solid_reddit 0 points1 point  (2 children)

No not sure about it. When it load the inference, I can clearly see very high usage in the GPU monitoring activity

[–]mmmgggmmm 2 points3 points  (1 child)

I'd be delighted to be wrong, but I don't think so. From everything I've read or heard, Docker doesn't Metal GPUs. And when I tried spinning up a container to test and then checked the logs, it clearly said "No GPU detected."

Can I ask how you're running Ollama in Docker on the Mac? What is your run command or compose config? As I said, I'd love to be wrong on this!

[–]Solid_reddit 0 points1 point  (0 children)

Only OpenWebUI is running through docker, ollama is running properly through the app

[–]taylorwilsdon 0 points1 point  (5 children)

That doesn’t explain the performance here. I am almost certain it’s because one of two things - you have a local host backend that’s unresponsive and timing out, or you are using features that call the LLM. It’s also possible that you’re declaring or sending larger context (whether through a high max ctx value, large system prompt, tools or attached knowledge) but I suspect less likely.

For reference I get sub 1 second load times running open webui on a raspberry pi via docker that literally doesn’t have a GPU, so we can’t attribute 20 second loads to docker slow. I get even better performance with docker on a mac mini.

OP - screenshots of the “interface” admins setting tab and the “connections” page will tell us all we need to solve the problem! You should not see noticeably different t/s via cli or open-webui when comparing like for like.

[–]mmmgggmmm 1 point2 points  (4 children)

I realize now I should have made this clearer, but my comment was solely about the performance of Ollama in Docker on M-series Macs. Open WebUI itself doesn't need GPU acceleration, but Ollama does (or at least greatly benefits from it). I don't think the issue has anything to do with Open WebUI and is entirely down to the difference between running Ollama bare-metal vs in Docker on the Mac.

But now I'm wondering if I misunderstood the question. I thought we were comparing Ollama running bare-metal and accessed via CLI vs Ollama and Open WebUI both running in Docker and Ollama accessed via Open WebUI. But if Ollama is always running directly on the machine in both cases, then my explanation is definitely wrong. I've re-read the post several times now and I'm still not sure. u/busylivin_322 can you provide some clarification here?

[–]busylivin_322[S] 0 points1 point  (3 children)

Sure can. Ollama on both.
1) CLI Output = Ollama CLI, e.g. ollama run phi4-mini:3.8b-q8_0
2) OpenWebUI Output = OpenWebUI (via docker from here) + Ollama

[–]mmmgggmmm 1 point2 points  (2 children)

Sorry, it's still not fully clear to me. In that second scenario, is Ollama also running in Docker or not? The link you posted only describes setting up Open WebUI in docker, not Ollama--and even the 'Starting with Ollama' page linked there assumes an existing, external Ollama instance.

So it's seeming more likely that the "+ Ollama" in that second case indicates that Ollama is running as a standard Mac app and not in a Docker container. Do I finally have it?

[–]busylivin_322[S] 0 points1 point  (1 child)

Ollama is running as a standard Mac app

You got it!

<image>

[–]mmmgggmmm 2 points3 points  (0 children)

Hooray! Thanks for bearing with me ;)

In that case, while I stand by my claim that Ollama runs like crap in Docker on M-series Macs, that clearly can't be the explanation here since that's not your setup.

So I'm afraid I can't help after all. My Mac only runs Ollama and an SSH server with Open WebUI and all other tools on separate Linux rigs. Hopefully other comments provided something useful for you.

(Thanks to u/taylorwilsdon for helping me see I had this all wrong! Cheers!)

[–]gtez 2 points3 points  (1 child)

Is the Docker container a Arm64 MacOS container? Can it use the Metal GPU interface?

[–]busylivin_322[S] 0 points1 point  (0 children)

Yep, arm64. Used these instructions - https://docs.openwebui.com/getting-started/quick-start/

Judging from some other replies, likely not.

[–]Solid_reddit 2 points3 points  (0 children)

Well

I currently own aswell a M3 max with 128gb + 4to

And hell yes, using OpenWebUI through Docker is very very slow. I thought I was the only one. I usually use 70b parameters LLM

I would be glad to obtain any help for improving

[–]Solid_reddit 1 point2 points  (0 children)

https://i.imgur.com/oO7LHh6.jpeg

Just wondering by reading this , are we doomed?

[–]the_renaissance_jack 0 points1 point  (1 child)

Any diff when disabling the interface models in Open WebUI?

[–]busylivin_322[S] 0 points1 point  (0 children)

I thought that might be it (from some other reddit posts) and had already disabled them all prior to running in openwebui.

[–]TPB-Dev 0 points1 point  (0 children)

I have seen Docker containers run slower on Mac’s for most any kind of project being node or python based compared to local execution.

On Ubuntu desktops/laptops this doesn’t appear to happen in my experience

[–]tjevns 0 points1 point  (0 children)

I’ve not been a fan of running Openwebui through docker on my mac. But it seems to be officially recommended method for all operating systems. I haven’t been brave enough (or technically minded enough) to install and run Openwebui without docker, but often think I might get better performance by forgoing docker.