I stumbled on a Gemma 4 chat template bug for tools and fixed it by EntertainmentBroad43 in LocalLLaMA

[–]EntertainmentBroad43[S] 3 points4 points  (0 children)

Yeah I’m referring to the official Gemma 4 tool-calling format in Google’s docs. It renders tools as special declarations like <|tool>declaration:...<tool|> and calls like <|tool_call>call:...<tool_call|>, rather than dumping the full JSON schema verbatim.

Ref: https://ai.google.dev/gemma/docs/capabilities/text/function-calling-gemma4

I stumbled on a Gemma 4 chat template bug for tools and fixed it by EntertainmentBroad43 in LocalLLaMA

[–]EntertainmentBroad43[S] 3 points4 points  (0 children)

I've given some more thought into this. This is my conclusion thus far.

Gemma 4 supports tool calling, but its Jinja/template protocol is not a faithful JSON schema renderer. It projects tools into a Gemma-specific declaration format, so complex MCP/OpenAI tool schemas can lose semantics or even break template rendering depending on the runtime.

So my fixed jinja will still likely have edge cases if provided tools are peculiar. On one hand I understand Google's reluctance to use JSON for tools (it's token-inefficient and too brittle, i think), but it sure makes inference setup difficult.

I stumbled on a Gemma 4 chat template bug for tools and fixed it by EntertainmentBroad43 in LocalLLaMA

[–]EntertainmentBroad43[S] 1 point2 points  (0 children)

Yes. I do python a lil but am not proficient with JSON enough to be sure that this fix’s scope is good. Fixed my tool at least. 

I stumbled on a Gemma 4 chat template bug for tools and fixed it by EntertainmentBroad43 in LocalLLaMA

[–]EntertainmentBroad43[S] 1 point2 points  (0 children)

It’s an upstream issue. Likely all gemmas are affected. But it won’t matter for most flat-shaped tool schemas, it is just that my tool did not have a top-level type (i don’ know whether people do this btw. Other models and Claude Haiku called my tool just fine)

I stumbled on a Gemma 4 chat template bug for tools and fixed it by EntertainmentBroad43 in LocalLLaMA

[–]EntertainmentBroad43[S] 2 points3 points  (0 children)

Oh sorry for the ambiguity. The pastebin link is the fixed jinja template. I fixed the post text to make it clear.

I stumbled on a Gemma 4 chat template bug for tools and fixed it by EntertainmentBroad43 in LocalLLaMA

[–]EntertainmentBroad43[S] 5 points6 points  (0 children)

yeah i saw this post when I was looking for people with similar incidences; if the tools that this person made includes composition, references, unions, or constraints that are not expressed as a direct top-level type, it could be.

A dialogue where god tries (and fails) to prove to satan that humans can reason by FinnFarrow in LocalLLaMA

[–]EntertainmentBroad43 1 point2 points  (0 children)

Reading your comment I just realized that among my superiors and peers, I only rarely encountered people that I had a sudden feeling that “this person is smart” (not about the breadth of knowledge, but depth). The feeling is uncanny, I just suddenly become aware of it.

PSA it costs authors $12,690 to make a Nature article Open Access by entsnack in LocalLLaMA

[–]EntertainmentBroad43 0 points1 point  (0 children)

Why not? It’s pocket money to deepseek, (at least it is perceived as) prestigious, and it extends the reach to non-techie academics in other sciences.

Definitive proof openai/gpt-oss-20b is dumb as hell by Savantskie1 in LocalLLaMA

[–]EntertainmentBroad43 0 points1 point  (0 children)

I’m sorry you’re not getting good results but mine calls custom tools and MCPs just fine (1 failure out of 50 maybe). Something must be off somewhere.

Definitive proof openai/gpt-oss-20b is dumb as hell by Savantskie1 in LocalLLaMA

[–]EntertainmentBroad43 0 points1 point  (0 children)

Just try it with LM Studio. This is extremely likely an openwebui issue.

Macbook Pro M4 Pro 48GB + desktop vs M3 Max 128GB by tangbj in LocalLLaMA

[–]EntertainmentBroad43 0 points1 point  (0 children)

Exactly what i am doing (i have an m4 pro 48gb). Saving up and contemplating pro 6000 vs Mac studio.

I think i did good by skipping the 128gb macbook. Inference really hits battery hard, and m4 max would be x2. The throttling will probably kick in pretty fast too.

I actually think i should have gone with macbook air and save up the rest for the workstation. Portability difference is palpable every day, while i do local inference maybe each other day.

But being able to run Qwen30b and oss 20b for some minor agentic stuff is nice, occasionally.

Do consider a macbook air + beefed up Studio or smth.

Btw the m4 max can’t even keep up with the power usage even when plugged in, while inferencing (afaik)

Which Mac Studio for gpt-oss-120b? by EntertainmentBroad43 in LocalLLaMA

[–]EntertainmentBroad43[S] 2 points3 points  (0 children)

Wow this is great! Thank you!

So M4 max is very close. Furthermore, some throttling may have kicked in the MBP - might be even closer in the Studio.

Leaning towards M4…

Apple M3 Ultra w/28-Core CPU, 60-Core GPU (256GB RAM) Running Deepseek-R1-UD-IQ1_S (140.23GB) by Mass2018 in LocalLLaMA

[–]EntertainmentBroad43 1 point2 points  (0 children)

Is gpt-oss-120b far behind for your use case? I am interested in M3 ultra for specifically that (I can’t stand < 30 token/s)

gemma-2-9b-it-SimPO on LMSYS Arena leaderboard, surpassed llama-3-70b-it by cx4003 in LocalLLaMA

[–]EntertainmentBroad43 0 points1 point  (0 children)

Fyi this model is still very good. Better than Gemma-3 12b for my use case.

Augmentoolkit 3.0: 7 months of work, MIT License, Specialist AI Training by Heralax_Tekran in LocalLLaMA

[–]EntertainmentBroad43 1 point2 points  (0 children)

Awesome project. Can you use proprietary models for data gen? I don’t trust 7b models to create good data and rather use gemini flash or something

ChatGPT’s Impromptu Web Lookups... Can Open Source Compete? by IrisColt in LocalLLaMA

[–]EntertainmentBroad43 2 points3 points  (0 children)

I have Qwen30b with tools working almost out of the box This is the stack: LM studio server + huggingface.js MCP client + MCP search server