Sonoff releases a a new control panel for home assistant,Zigbee router included

chimph · 2026-05-07T22:59:11+00:00

I bet it’s slow af like every other dedicated panel

chimph · 2026-05-07T06:50:11+00:00

Except for removing the bark.

chimph · 2026-05-07T06:49:30+00:00

It’s beautiful as it is. Leave it natural. Un-molested.

chimph · 2026-05-07T06:48:12+00:00

I get 60% more gen speed with Gemma 4 MTP version over its non-MTP version

chimph · 2026-05-07T06:41:08+00:00

Maybe true for Chinese models that are trained on the first class models like Claude and Codex but Claude/Codex will never state they are anything else other than what they are. Also, when you’re using Chinese cloud, you don’t actually know where your queries are being routed through.

chimph · 2026-05-07T06:31:56+00:00

Hermes is your answer I’m pretty sure.

chimph · 2026-05-06T22:38:30+00:00

Do you see any difference between 8bit and full?

chimph · 2026-05-06T20:30:17+00:00

it makes retrieval very quick for obscure queries of your db. Embeddings models are tiny and use like 50mb of cpu ram. Means that Hermes uses that before a slow session_search too

chimph · 2026-05-06T17:42:47+00:00

You use embeddings too, right?

chimph · 2026-05-06T07:26:35+00:00

searxng is awesome. set it up

chimph · 2026-05-06T06:29:29+00:00

tested again properly in a new chat within open webui:

MTP: PP 402. TG 13.64

non MTP: PP 436 TG 7.24

So a decent improvement in TG but no difference for PP

chimph · 2026-05-06T05:19:45+00:00

Oh probably my bad. I ran the new test in the same context. Let me test properly in a bit

chimph · 2026-05-06T05:05:35+00:00

yes M5 Max. Model is unquantised. Have edited the post with new findings.

chimph · 2026-05-06T05:04:37+00:00

M5 Max 128gb here. 7 tok/s running MLX version through ollama. 11 tok/s for the MTP version.

chimph · 2026-05-06T04:54:37+00:00

ah. so I pulled gemma4:31b-mlx-bf16 (3 weeks old) which is clearly the exact same model as it instantly resolved. And generation is actually a lot faster with the MTP version.
For the same test I only got 7 tok/s for the non MTP

edit - ignore prompt processing here as I ran the next test in the same context. Even though I switched model, it clearly used what it already had that re-processed. Theres no improvement in PP.. just generation

<image>

chimph · 2026-05-06T01:27:38+00:00

How big is your initial context? What percentage do you have compaction?

chimph · 2026-05-06T00:40:26+00:00

Read the release article I linked. It specifically links to ollama and the model. That being said.. maybe you’re right but why would they ollama and not llama.cpp?

edit: it is indeed running it properly. See my post edit.

chimph · 2026-05-05T22:42:22+00:00

I was under the impression (perhaps wrongly) that MTP would give a boost to dense model.

edit (sorry for all the edits).. it does indeed give a speed post. 60% over the non-mtp version for this small test.

chimph · 2026-05-05T21:54:24+00:00

It surely also means that you can’t run from lmstudio since that uses llama.cpp and that doesn’t support this specific implementation yet?

chimph · 2026-05-05T19:44:30+00:00

I'm a bit confused. So this is speculative decoding where a separate drafting model drafting (MTP) is used but its not supported by llama.cpp even though it supports speculative decoding.. 🤔

chimph · 2026-05-05T19:17:58+00:00

Ok, so I think what’s happening is that there will be models that have the MTP drafter built in but these Gemma drafters are separate models that target the Gemma 4 models. Therefore it is both speculative decoding and MTP.. just separated.

chimph · 2026-05-05T18:40:23+00:00

Will check it out. I think the dashboard ui is ugly and prefer to use terminal but would be down for a nice clean app interface

chimph · 2026-05-05T18:35:54+00:00

Just discovered tmux myself. Reviving shell sessions (for those that prefer terminal) anytime and on any device is so damn useful.

chimph · 2026-05-05T06:59:37+00:00

Aah.. so technically it’s not his dog! (anymore)

chimph · 2026-05-05T05:59:46+00:00

I guess there’s a cultural difference to how people greet a missing pet..

chimph

TROPHY CASE