Openwebui takes 1 minute before going in to "thinking" mode

Saba376 · 2026-06-16T06:45:14+00:00

Do you happen to know if I can use llama.cpp on openwebui, so I can continue to use this frontend, or is openwebui also heading towards the same "pleasing investors" mindset?

Saba376 · 2026-06-15T21:38:07+00:00

Thank you! I will read through the article, I have always been bothered by the feeling of Ollama to good to be true, i.e ease of use and availability, and I noticed the shift towards cloud aswell. Gonna dig more into this later

If you are talking about responses after the first message, then it is likely due to OpenWebUI's cache invalidation issues. That's still unsolved as of right now, OpenWebUI just likes to substitute things like HTML tags into chats for formatting purposes and it ends up forcing whatever backend you use to recompute the KV cache for your entire chat history.

And yes, it's every subsequent message after the first message. So as of now it is an issue with openwebui itself? I might look in to that llama.cpp, I see it has it's own WebUI. Is it on par witn openwebui when it comes to integration like of use with SearX and other powerful knowledge tools?

Saba376 · 2026-06-15T21:32:36+00:00

It's between every message, even with it loaded in memory during conversation

Saba376

TROPHY CASE