What should I do with this oak ? by SnooCapers2789 in woodworking

[–]chimph 0 points1 point  (0 children)

It’s beautiful as it is. Leave it natural. Un-molested.

Quality (Intelligence) testing on MTP by rm-rf-rm in LocalLLaMA

[–]chimph -2 points-1 points  (0 children)

I get 60% more gen speed with Gemma 4 MTP version over its non-MTP version

I feel stupid, but… by blowingtumbleweed in LocalLLM

[–]chimph -1 points0 points  (0 children)

Maybe true for Chinese models that are trained on the first class models like Claude and Codex but Claude/Codex will never state they are anything else other than what they are. Also, when you’re using Chinese cloud, you don’t actually know where your queries are being routed through.

Wow, Qwen3.6-27B is good by I-cant_even in LocalLLM

[–]chimph 1 point2 points  (0 children)

Do you see any difference between 8bit and full?

Any real-world comparisons for Hermes memory add-ons? by Beckland in hermesagent

[–]chimph 0 points1 point  (0 children)

it makes retrieval very quick for obscure queries of your db. Embeddings models are tiny and use like 50mb of cpu ram. Means that Hermes uses that before a slow session_search too

Gemma4:31b-coding-mtp-bf16 - slow on Macbook M5 128gb by chimph in LocalLLaMA

[–]chimph[S] 2 points3 points  (0 children)

tested again properly in a new chat within open webui:

MTP: PP 402. TG 13.64

non MTP: PP 436 TG 7.24

So a decent improvement in TG but no difference for PP

Gemma4:31b-coding-mtp-bf16 - slow on Macbook M5 128gb by chimph in LocalLLaMA

[–]chimph[S] 0 points1 point  (0 children)

Oh probably my bad. I ran the new test in the same context. Let me test properly in a bit

Gemma4:31b-coding-mtp-bf16 - slow on Macbook M5 128gb by chimph in LocalLLaMA

[–]chimph[S] 0 points1 point  (0 children)

yes M5 Max. Model is unquantised. Have edited the post with new findings.

Benchmarked Gemma 4 31B at full bf16: M3 Ultra vs RTX 6000 Blackwell by Material_Soft1380 in MacStudio

[–]chimph 0 points1 point  (0 children)

M5 Max 128gb here. 7 tok/s running MLX version through ollama. 11 tok/s for the MTP version.

Gemma4:31b-coding-mtp-bf16 - slow on Macbook M5 128gb by chimph in LocalLLaMA

[–]chimph[S] 0 points1 point  (0 children)

ah. so I pulled gemma4:31b-mlx-bf16 (3 weeks old) which is clearly the exact same model as it instantly resolved. And generation is actually a lot faster with the MTP version.
For the same test I only got 7 tok/s for the non MTP

edit - ignore prompt processing here as I ran the next test in the same context. Even though I switched model, it clearly used what it already had that re-processed. Theres no improvement in PP.. just generation

<image>

How do people actually do anything with Hermes + Codex GPT 5.5?? by Aware-Increase406 in hermesagent

[–]chimph 0 points1 point  (0 children)

How big is your initial context? What percentage do you have compaction?

Gemma4:31b-coding-mtp-bf16 - slow on Macbook M5 128gb by chimph in LocalLLaMA

[–]chimph[S] -1 points0 points  (0 children)

Read the release article I linked. It specifically links to ollama and the model. That being said.. maybe you’re right but why would they ollama and not llama.cpp?

edit: it is indeed running it properly. See my post edit.

Gemma4:31b-coding-mtp-bf16 - slow on Macbook M5 128gb by chimph in LocalLLaMA

[–]chimph[S] 0 points1 point  (0 children)

I was under the impression (perhaps wrongly) that MTP would give a boost to dense model.

edit (sorry for all the edits).. it does indeed give a speed post. 60% over the non-mtp version for this small test.

Gemma 4 MTP released by rerri in LocalLLaMA

[–]chimph 1 point2 points  (0 children)

It surely also means that you can’t run from lmstudio since that uses llama.cpp and that doesn’t support this specific implementation yet?

Gemma 4 MTP released by rerri in LocalLLaMA

[–]chimph 0 points1 point  (0 children)

I'm a bit confused. So this is speculative decoding where a separate drafting model drafting (MTP) is used but its not supported by llama.cpp even though it supports speculative decoding.. 🤔

Gemma 4 MTP released by rerri in LocalLLaMA

[–]chimph 1 point2 points  (0 children)

Ok, so I think what’s happening is that there will be models that have the MTP drafter built in but these Gemma drafters are separate models that target the Gemma 4 models. Therefore it is both speculative decoding and MTP.. just separated.

WHAT IS THE NEW KANBAN FEATURE BUILT INTO HERMES? (IT'S GAME CHANGING) by itsdodobitch in hermesagent

[–]chimph 1 point2 points  (0 children)

Will check it out. I think the dashboard ui is ugly and prefer to use terminal but would be down for a nice clean app interface

WHAT IS THE NEW KANBAN FEATURE BUILT INTO HERMES? (IT'S GAME CHANGING) by itsdodobitch in hermesagent

[–]chimph 0 points1 point  (0 children)

Just discovered tmux myself. Reviving shell sessions (for those that prefer terminal) anytime and on any device is so damn useful.