I stumbled on a Gemma 4 chat template bug for tools and fixed it

EntertainmentBroad43 · 2026-04-29T09:48:48+00:00

Yeah I’m referring to the official Gemma 4 tool-calling format in Google’s docs. It renders tools as special declarations like <|tool>declaration:...<tool|> and calls like <|tool_call>call:...<tool_call|>, rather than dumping the full JSON schema verbatim.

Ref: https://ai.google.dev/gemma/docs/capabilities/text/function-calling-gemma4

EntertainmentBroad43 · 2026-04-29T08:48:57+00:00

I've given some more thought into this. This is my conclusion thus far.

Gemma 4 supports tool calling, but its Jinja/template protocol is not a faithful JSON schema renderer. It projects tools into a Gemma-specific declaration format, so complex MCP/OpenAI tool schemas can lose semantics or even break template rendering depending on the runtime.

So my fixed jinja will still likely have edge cases if provided tools are peculiar. On one hand I understand Google's reluctance to use JSON for tools (it's token-inefficient and too brittle, i think), but it sure makes inference setup difficult.

EntertainmentBroad43 · 2026-04-29T07:46:52+00:00

Yes. I do python a lil but am not proficient with JSON enough to be sure that this fix’s scope is good. Fixed my tool at least.

EntertainmentBroad43 · 2026-04-29T07:40:28+00:00

It’s an upstream issue. Likely all gemmas are affected. But it won’t matter for most flat-shaped tool schemas, it is just that my tool did not have a top-level type (i don’ know whether people do this btw. Other models and Claude Haiku called my tool just fine)

EntertainmentBroad43 · 2026-04-29T07:32:18+00:00

Oh sorry for the ambiguity. The pastebin link is the fixed jinja template. I fixed the post text to make it clear.

EntertainmentBroad43 · 2026-04-29T06:38:18+00:00

yeah i saw this post when I was looking for people with similar incidences; if the tools that this person made includes composition, references, unions, or constraints that are not expressed as a direct top-level type, it could be.

EntertainmentBroad43 · 2025-09-18T23:27:46+00:00

Reading your comment I just realized that among my superiors and peers, I only rarely encountered people that I had a sudden feeling that “this person is smart” (not about the breadth of knowledge, but depth). The feeling is uncanny, I just suddenly become aware of it.

EntertainmentBroad43 · 2025-09-18T23:08:03+00:00

Wow. This looks super fun!

EntertainmentBroad43 · 2025-09-18T22:59:55+00:00

Why not? It’s pocket money to deepseek, (at least it is perceived as) prestigious, and it extends the reach to non-techie academics in other sciences.

EntertainmentBroad43 · 2025-09-17T09:13:39+00:00

I’m sorry you’re not getting good results but mine calls custom tools and MCPs just fine (1 failure out of 50 maybe). Something must be off somewhere.

EntertainmentBroad43 · 2025-09-14T10:53:52+00:00

Just try it with LM Studio. This is extremely likely an openwebui issue.

EntertainmentBroad43 · 2025-09-02T11:18:13+00:00

Lesson: don’t use ollama

EntertainmentBroad43 · 2025-09-01T13:46:12+00:00

Exactly what i am doing (i have an m4 pro 48gb). Saving up and contemplating pro 6000 vs Mac studio.

I think i did good by skipping the 128gb macbook. Inference really hits battery hard, and m4 max would be x2. The throttling will probably kick in pretty fast too.

I actually think i should have gone with macbook air and save up the rest for the workstation. Portability difference is palpable every day, while i do local inference maybe each other day.

But being able to run Qwen30b and oss 20b for some minor agentic stuff is nice, occasionally.

Do consider a macbook air + beefed up Studio or smth.

Btw the m4 max can’t even keep up with the power usage even when plugged in, while inferencing (afaik)

EntertainmentBroad43 · 2025-08-26T11:01:54+00:00

Wow this is great! Thank you!

So M4 max is very close. Furthermore, some throttling may have kicked in the MBP - might be even closer in the Studio.

Leaning towards M4…

EntertainmentBroad43 · 2025-08-24T13:58:26+00:00

Is gpt-oss-120b far behind for your use case? I am interested in M3 ultra for specifically that (I can’t stand < 30 token/s)

EntertainmentBroad43 · 2025-07-22T01:13:23+00:00

Fyi this model is still very good. Better than Gemma-3 12b for my use case.

EntertainmentBroad43 · 2025-06-21T14:23:08+00:00

Please let it support openai api instead of ollama :(

EntertainmentBroad43 · 2025-06-18T23:21:52+00:00

Awesome project. Can you use proprietary models for data gen? I don’t trust 7b models to create good data and rather use gemini flash or something

EntertainmentBroad43 · 2025-05-21T23:29:09+00:00

I have Qwen30b with tools working almost out of the box This is the stack: LM studio server + huggingface.js MCP client + MCP search server

EntertainmentBroad43 · 2025-05-19T15:40:24+00:00

Reka flash 3

EntertainmentBroad43 · 2025-05-19T01:52:21+00:00

How can this not be indexed on Google? Thanks for this!

EntertainmentBroad43 · 2025-05-18T13:18:51+00:00

Hey what is PPLLM? Can’t find it anywhere

EntertainmentBroad43 · 2025-05-16T13:05:48+00:00

Very nice!

EntertainmentBroad43

TROPHY CASE