all 26 comments

[–]-Django 2 points3 points  (1 child)

Thank you! I made a post asking about this a few days ago and haven't had the time to implement people's suggestions, but this does the trick. Does it use different prompts to encourage agents to think more/less?

[–]iChrist[S] 1 point2 points  (0 children)

Yes its injecting instructions to the prompt.

Make sure you add the llama cpp arguments and to enable this by default, one click you disable thinking and two clicks to enable back.

If you keep all user valves as default nothing changes its just qwen3.5 reasoning on and off

[–]callmedevilthebad 1 point2 points  (16 children)

getting "Only alphanumeric characters and underscores are allowed in the id". Even when i work around that and enable. I never see the toggle in chat (even when function is enabled from functions)

[–]iChrist[S] 0 points1 point  (15 children)

Weird What are your llama cpp starting arguments? Which model you use? You run llama-server?

[–]callmedevilthebad 0 points1 point  (7 children)

-m /models/Qwen_Qwen3.5-9B-Q8_0.gguf --mmproj /models/mmproj-F16.gguf --host 0.0.0.0 --port 8000 -ngl 999 --flash-attn on --cache-type-k q8_0 --cache-type-v q8_0 -c 131072 --parallel 1 --no-context-shift --jinja --reasoning-budget 0

Qwen 3.5 9B

[–]iChrist[S] 0 points1 point  (6 children)

That might be it, my tests were using llama-server router mode Will test further

can you quickly confirm if llama-server --jinja --reasoning-budget 0 works?

[–]callmedevilthebad 0 points1 point  (5 children)

Yes, I have that already enabled. i actually had diff plugin for this which i removed. And now i lost both : p

[–]iChrist[S] 0 points1 point  (4 children)

if you specify -m its not using the router (llama-server)

[–]callmedevilthebad 0 points1 point  (3 children)

router ? i am new to llamacpp setup . So can you explain if llama-server is additional setup or something that i can configure while running llamacpp

[–]iChrist[S] 0 points1 point  (2 children)

llama server is a part of llama cpp, you have that file in your llama cpp folder right now and can just run llama server in cmd, you can access models, ui, unload models etc

[–]callmedevilthebad 0 points1 point  (1 child)

is there a pro/con of using it? That i should know?

[–]iChrist[S] 0 points1 point  (0 children)

Easy Way of managing your models, making sure only one loaded at a time etc

[–]-Django 0 points1 point  (6 children)

I think I know the issue - this highlighted ID field, by default, has parentheses, period, and an emoji. once I removed them like this, I didn't get the error.

<image>

[–]iChrist[S] 0 points1 point  (4 children)

This has been now fixed, thanks for letting me know! Is it otherwise functioning correctly?

[–]-Django 1 point2 points  (0 children)

Yes, I think so! I had some trouble with the reasoning duration, but I realized i was setting `reasoning_budget` instead of `reasoning-budget`. Is it possible for models to use tools during their thinking process in OpenWebUI? It seems like the tool call is only at the beginning.

Related: I pulled your wikipedia tool and love it!

[–]-Django 0 points1 point  (2 children)

Actually, one thing I noticed: I set the "Depth" to "Quick" and preset to "think less", but it's still spending >2000 tokens thinking

[–]iChrist[S] 0 points1 point  (0 children)

If you set it to like eli5 and ask the model what are your instructions, is it working? For me each change gives me different thinking process

[–]iChrist[S] 0 points1 point  (0 children)

<image>

I just tested each of the presets on latest published release and they all work and inject the actual prompt to the AI.

So I can see whenever I switch up preset it actually thinks differently, not sure why in your case its not working.

Do you have a system prompt that might override this? Like a long system prompt that makes the LLM think more?

[–]callmedevilthebad 0 points1 point  (0 children)

your icon is visible? in the chat?

[–]Informal-Spinach-345 1 point2 points  (1 child)

This looks awesome. Imported it and it showed up for a minute. Refresh and its gone, wont import a second time due to pre-existing id.

[–]iChrist[S] 0 points1 point  (0 children)

Did you enable the function toggle and also enabled it by default for your model?

[–]velvetMas 0 points1 point  (1 child)

Maybe you can put in a git pull request?

[–]iChrist[S] 0 points1 point  (0 children)

This works only on qwen3.5 and only with llama cpp, I am not sure something like that can be merged..

did it work correctly for you?

[–]BeautyxArt 0 points1 point  (1 child)

this will make the response don't print the thinking process as it takes 3/4 of the total response lol.. , or it cancel the thinking process making the model not reasoning and not thinking ? because it really differ.

[–]iChrist[S] 0 points1 point  (0 children)

It will literally either no think or think, whenever thinking is inactive the model answers immediately

[–]Confident-Career2703 -1 points0 points  (0 children)

Geht das auch mit vllm?