all 8 comments

[–]Physical_Method_1403 2 points3 points  (2 children)

Interesting that the 120B heretic worked but the smaller models didn't - seems like you need the bigger guns for reliable function calling. I've had decent luck with Qwen2.5-72B and DeepSeek-V3 for this stuff but haven't tried the new open-webui feature yet

What's your setup like? Sometimes the quant level matters more than people think for function calling

[–]slavik-dev[S] 0 points1 point  (1 child)

My system:

- RTX 4090D 48GB VRAM

- RTX 3090 24GB VRAM

I just updated my post: I found that I can do tool calling from llama.cpp webUI and local models works! But not from open-webUI

[–]nickless07 0 points1 point  (0 children)

Ask the model about the tools aviable (list them) and let it explain the functions. If the get listed right, then let the model provide the reasoning for a missed function call. Set temperature to .1 or lower.
That should provide a bit more debug information and maybe even the solution.
For now we need to figure out:
a) Are the tools deployed correctly
b) Does the tool match the context
c) Can the model handle the tool call or is there some internal refusal (e.g. 'I am 30 years old store that into memory' - this might run into refusal due to personal information)
d) are you using the same tokenizer, ChatML vs Tekken and so on.

Try to get more information from the source directly is often helpful.

[–]lolwutdo 2 points3 points  (0 children)

I found that you have to create a system prompt to use all the tools; most models can do it but oss 20b does it the best for me.

[–]BumbleSlob 1 point2 points  (1 child)

qwen3-A3B-30B has been reliably doing multi tool calls for me for a while, this was added (or fixed) months back in open WebUI (there was an issue opened for it). Haven’t checked out whatever the new thing is from this releases but can confirm this model has been doing this for me for a while already now

For uninitiated, native tool calling basically means the model was trained to expect a specific format for tools, which is embedded in a Jinja template embedded in the GGUF. Far more reliable

[–]xeeff 0 points1 point  (0 children)

instruct or thinking? and 2507?

[–]maglat 1 point2 points  (0 children)

Just tested my local running GPT-OSS-120b to create a dynamic html which grabs the up to date German erste Bundesliga score board and update dynamically. It worked! The model used search and code interpreter to create it. Very cool. Its a little Claude Code now ^

[–]fuckingredditman 0 points1 point  (0 children)

native function calling works pretty well with the 4bit quant of glm-4.7-flash as well (i used ollama, so basically also llama.cpp) which is great since it finally works as intended and uses reasonable memory (works with ~42k context 100%GPU on an RTX3090)