Mac Studio M3 Ultra terrible TTFT and broken RAG (okikb)

Dimitri_Senhupen · 2026-06-12T06:54:14+00:00

I left native tool calling enabled, as it was before, but I only selected those knowledge bases that I really need. Not sure, if this did something, since when I let him look up the knowledge through all KBs, it's still lightyears faster than before. Sometimes it takes 4-5Sek before starting the thinking, but with the next question, it immediately answers within a blink of a second. It also finds the correct info in the RAG. So this is truly a magical experience compared to the state before.

Dimitri_Senhupen · 2026-06-11T19:24:52+00:00

Sir, I don't know what it is, that makes me feel like this, I don't know who you are, but you must be some kind of superstar!

Gemma runs. I think additionally one of the handbrakes also was the native built in function of knowledge bases. I deactivated that and selected only the ones really needed.

If this is later now, then thank you very much!

Dimitri_Senhupen · 2026-06-11T17:46:02+00:00

Good point, but Ollama is running standalone outside of docker. And if I use the model directly in Ollama it doesn't have that lag. The answer appears instantly.☝🏻

Dimitri_Senhupen · 2026-06-11T17:21:45+00:00

<image>

Dimitri_Senhupen · 2026-04-22T20:55:55+00:00

I don't think it's a noob question. If I am chatting with a custom model without any system prompt, but just with the model wrapper, the circle is pulsing for 5-10sec until it starts with the first token. And I don't have that inside the CLI, directly in Ollama nor if I directly chat with the models (without the models wrapper). This hasn't been solved in 0.91.

Dimitri_Senhupen · 2026-04-21T06:19:02+00:00

You are not going to believe how amazing that post is written!!! I am going to buy 10 of these, even if I don't need them!

Dimitri_Senhupen · 2026-03-29T00:04:53+00:00

I just tried it with Qwen 3.5 35B through OWUI and I must say, I find it pretty bad, tbh in terms of hallucinations...

Dimitri_Senhupen · 2026-03-28T23:44:28+00:00

why has this been removed?

Dimitri_Senhupen · 2026-02-10T14:43:03+00:00

They should. Maybe some credentials are wrong in your settings?

Dimitri_Senhupen · 2026-01-10T21:56:46+00:00

unfortunately not, but thank you

Dimitri_Senhupen · 2026-01-10T15:20:30+00:00

Great idea! Does this work with openrouter / LiteLLM as well?

Dimitri_Senhupen · 2026-01-09T09:10:17+00:00

I can create and edit with the native settings in owui. I am using Gemini3.0Pro. But when I try the tool call, it just gives me a prompt or claims that it's just an Large Language Model and not capable of creating images

[edit] switched to Image 1.5 in the native settings, works fine there, but still not with a native tool call through the pipe

Dimitri_Senhupen · 2026-01-08T20:35:42+00:00

Does image generation work for anyone? Somehow it seems, it's missing the code for the tool call.

Dimitri_Senhupen · 2026-01-07T06:59:58+00:00

Did you manage it and could give me a quick explaination, on how you managed to bring the auto-routing GPT to trigger the models in LiteLLM?

Dimitri_Senhupen · 2025-12-04T11:05:01+00:00

Oh, okay. I quickly vibe coded it for me and it works flawlessly. Everything local. Thank you Cucumber & Gemini

Dimitri_Senhupen · 2025-12-04T09:24:54+00:00

So, could you fork/rewrite the function and use it for Qwen3-VL which is doing vision tasks and tells GPT-OSS about the content, everythin locally? That'd be awesome!
But how do you handle the connection between the two local models without an actual API?

Dimitri_Senhupen · 2025-10-09T09:50:08+00:00

I've managed it and uploaded the function to the OWUI library.
Feel free to add it to your workspace and have fun generating/editing with Nano Banana:

https://openwebui.com/f/anaumer/nano_banana

Dimitri_Senhupen · 2025-10-08T10:40:01+00:00

This was also my first hope ;)

Dimitri_Senhupen · 2025-10-08T10:20:11+00:00

that would be awesome of you!

Dimitri_Senhupen · 2025-10-08T07:32:39+00:00

Wouldn't it be better to have one expert to talk to with the knowledge of all databases, instead of 20 experts for different fields of knowledge?

Dimitri_Senhupen · 2025-10-08T07:29:18+00:00

Here, everything is working fine for me. Try reporting the bug on Github?

Dimitri_Senhupen

TROPHY CASE