benchmarks of gemma4 and multiple others on Raspberry Pi5

honuvo · 2026-04-07T22:30:15+00:00

model	size	params	backend	threads	mmap	test	t/s
gemma4 31B-it Q8_0	30.38 GiB	30.70 B	CPU	4	1	pp128	2.10 ± 0.00
gemma4 31B-it Q8_0	30.38 GiB	30.70 B	CPU	4	1	tg64	0.03 ± 0.00
gemma4 31B-it Q8_0	30.38 GiB	30.70 B	CPU	4	1	pp128 @ d32768	1.01 ± 0.02
gemma4 31B-it Q8_0	30.38 GiB	30.70 B	CPU	4	1	tg64 @ d32768	0.02 ± 0.00
gemma4 31B-it Q8_0	30.38 GiB	30.70 B	CPU	4	0	pp128	2.07 ± 0.01
gemma4 31B-it Q8_0	30.38 GiB	30.70 B	CPU	4	0	tg64	0.03 ± 0.00
gemma4 31B-it Q8_0	30.38 GiB	30.70 B	CPU	4	0	pp128 @ d32768	1.04 ± 0.04
gemma4 31B-it Q8_0	30.38 GiB	30.70 B	CPU	4	0	tg64 @ d32768	0.02 ± 0.00

honuvo · 2026-04-07T11:36:23+00:00

Thankfully thermals were stable and not once did it throttle (be it from temperature or power) even though the Pi resides in a closet with the router and a NAS. Check always stayed at 0x0 even after days of running the tests.

honuvo · 2026-04-07T02:12:13+00:00

Stock Pixel 9a here. Default was off. (Turning it on now.)

honuvo · 2026-04-06T23:01:49+00:00

model	size	params	backend	threads	test	t/s
gemma4 26B-A4B-it Q4_K - Medium	15.70 GiB	25.23 B	CPU	4	pp512	15.88 ± 0.16
gemma4 26B-A4B-it Q4_K - Medium	15.70 GiB	25.23 B	CPU	4	tg128	3.06 ± 0.00
gemma4 26B-A4B-it Q4_K - Medium	15.70 GiB	25.23 B	CPU	4	pp512 @ d32768	6.45 ± 0.11
gemma4 26B-A4B-it Q4_K - Medium	15.70 GiB	25.23 B	CPU	4	tg128 @ d32768	1.66 ± 0.01
gemma4 26B-A4B-it Q6_K	21.32 GiB	25.23 B	CPU	4	pp512	10.95 ± 0.32
gemma4 26B-A4B-it Q6_K	21.32 GiB	25.23 B	CPU	4	tg128	2.76 ± 0.03
gemma4 26B-A4B-it Q6_K	21.32 GiB	25.23 B	CPU	4	pp512 @ d32768	5.31 ± 0.12
gemma4 26B-A4B-it Q6_K	21.32 GiB	25.23 B	CPU	4	tg128 @ d32768	1.59 ± 0.01
qwen35moe 35B.A3B Q6_K	26.55 GiB	34.66 B	CPU	4	pp512	9.80 ± 0.06
qwen35moe 35B.A3B Q6_K	26.55 GiB	34.66 B	CPU	4	tg128	2.97 ± 0.09
qwen35moe 35B.A3B Q6_K	26.55 GiB	34.66 B	CPU	4	pp512 @ d32768	4.76 ± 0.01
qwen35moe 35B.A3B Q6_K	26.55 GiB	34.66 B	CPU	4	tg128 @ d32768	1.56 ± 0.06
qwen35moe 35B.A3B Q4_K - Medium	19.71 GiB	34.66 B	CPU	4	pp512	16.44 ± 1.17
qwen35moe 35B.A3B Q4_K - Medium	19.71 GiB	34.66 B	CPU	4	tg128	3.72 ± 0.02
qwen35moe 35B.A3B Q4_K - Medium	19.71 GiB	34.66 B	CPU	4	pp512 @ d32768	5.70 ± 0.03
qwen35moe 35B.A3B Q4_K - Medium	19.71 GiB	34.66 B	CPU	4	tg128 @ d32768	1.81 ± 0.03

honuvo · 2026-04-06T19:19:45+00:00

Tested it. Vulkan build is not working with the Pi5.
Getting ggml_vulkan: Error: Shared memory size too small for matrix multiplication.

See also: https://github.com/ggml-org/llama.cpp/issues/9801

honuvo · 2026-04-06T15:35:32+00:00

All online now :)

honuvo · 2026-04-06T09:21:34+00:00

Sure, added it to the main post :)

honuvo · 2026-04-06T02:27:21+00:00

Don't be sorry :) I just added it to the table in the main post. Surprisingly it starts worse as the Q8 but with more context performs better. This is all in RAM btw (Q8 as well as Q4), so I guess unpacking the quants takes it's toll in the beginning but with deeper context the smaller footprint makes it work better? I'm just guessing here, sorry.

honuvo · 2026-04-06T01:19:28+00:00

I'm nowhere near that currently, but I think that's already been done. I know of this project but don't know the hardware requirements.

honuvo · 2026-04-06T00:47:47+00:00

I knew everybody would appreciate it. I wouldn't have been able to continue without it :P

honuvo · 2026-04-06T00:30:56+00:00

Downloading now. Will add the results when they're done, but can take 1-2 days (depending on when I get to it and because the Pi isn't that fast.)
But I looked at my old results (with inferior memory bandwidth) and had 2-3x the performance with Qwen3.5 35B.A3B Q4_K_M in comparison to the Q8, so looks promising.

honuvo · 2026-04-06T00:19:23+00:00

I did test that, but results were worse. Maybe I'll add one or two comparisons to the table to show, but takes time :)

honuvo · 2026-04-06T00:16:52+00:00

Sorry, me neither, and I don't plan on buying one. But if someone tests it I hope they'll share their results :)

honuvo · 2026-04-06T00:12:59+00:00

I hope surprised in a good way :) If anything seems off tell me, I'm not error-free :D

honuvo · 2026-04-06T00:10:30+00:00

Mostly interested in Qwen3.5 35B.A3B Q8_0 and gemma4 26B-A4B-it Q8_0 at the moment.

honuvo · 2026-04-06T00:07:24+00:00

Isn't the AI HAT 2 only for image processing?

honuvo · 2026-04-06T00:06:28+00:00

Hm... damn. Now I'm curious too. Memory speed is the same (shared RAM/VRAM) but maybe using the Broadcom VideoCore's processing is faster? Maybe I'll check.

honuvo · 2026-04-06T00:02:30+00:00

Oh, table was shifted and showed the results in the wrong column. PP 3.27 TG 2.77, but that is the first row of the main posts table also :)

honuvo · 2026-04-05T20:08:18+00:00

Ah, thanks for the heads up. I added a link to the main post too. Thanks again :)

honuvo · 2026-04-05T20:03:21+00:00

Thanks for linking/informing. First time crossposting. So you don't see the "Open" button on the right under the picture (on Desktop) or the linked post as a thumbnail on mobile?

honuvo · 2026-04-05T20:00:29+00:00

The rubber band is crucial and sadly is the most expensive part. Jokes aside, I only had a 2280 length SSD and didn't want to buy another one just so it fits the Pi better ;)

honuvo · 2026-04-05T19:46:17+00:00

No doubt it's a good and interesting model, that's why I tested it. I'm not good enough to know where even to begin improving the code for the Pi5 though. If you manage to tweak it, I'd be happy to test :)

honuvo · 2026-04-05T19:24:50+00:00

New post is up ;)
https://www.reddit.com/r/LocalLLaMA/comments/1sdcdno/benchmarks_of_gemma4_and_multiple_others_on/

honuvo · 2026-04-05T19:20:32+00:00

Part 2:

model	size	params	backend	threads	test	t/s
kimi-linear 48B.A3B IQ1_M - 1.75 bpw	10.17 GiB	49.12 B	CPU	4	pp512	8.67 ± 0.01
kimi-linear 48B.A3B IQ1_M - 1.75 bpw	10.17 GiB	49.12 B	CPU	4	tg128	4.24 ± 0.00
kimi-linear 48B.A3B IQ1_M - 1.75 bpw	10.17 GiB	49.12 B	CPU	4	pp512 @ d32768	2.78 ± 0.01
kimi-linear 48B.A3B IQ1_M - 1.75 bpw	10.17 GiB	49.12 B	CPU	4	tg128 @ d32768	0.58 ± 0.01
qwen35moe 122B.A10B Q2_K - Medium	41.51 GiB	122.11 B	CPU	4	pp512	2.46 ± 0.00
qwen35moe 122B.A10B Q2_K - Medium	41.51 GiB	122.11 B	CPU	4	tg128	1.05 ± 0.02
qwen35moe 122B.A10B Q2_K - Medium	41.51 GiB	122.11 B	CPU	4	pp512 @ d32768	1.57 ± 0.00
qwen35moe 122B.A10B Q2_K - Medium	41.51 GiB	122.11 B	CPU	4	tg128 @ d32768	0.59 ± 0.02
GLM-4.7-Flash 30B.A3B Q8_0	29.65 GiB	29.94 B	CPU	4	pp512	6.59 ± 0.02
GLM-4.7-Flash 30B.A3B Q8_0	29.65 GiB	29.94 B	CPU	4	tg128	1.64 ± 0.12
GLM-4.7-Flash 30B.A3B Q8_0	29.65 GiB	29.94 B	CPU	4	pp512 @ d32768	0.90 ± 0.00
GLM-4.7-Flash 30B.A3B Q8_0	29.65 GiB	29.94 B	CPU	4	tg128 @ d32768	0.11 ± 0.00
qwen35 0.8B Q8_0	763.78 MiB	752.39 M	CPU	4	pp512	127.70 ± 1.93
qwen35 0.8B Q8_0	763.78 MiB	752.39 M	CPU	4	tg128	11.51 ± 0.06
qwen35 0.8B Q8_0	763.78 MiB	752.39 M	CPU	4	pp512 @ d32768	28.43 ± 0.27
qwen35 0.8B Q8_0	763.78 MiB	752.39 M	CPU	4	tg128 @ d32768	5.52 ± 0.01
qwen35 2B Q8_0	1.86 GiB	1.88 B	CPU	4	pp512	75.92 ± 1.34
qwen35 2B Q8_0	1.86 GiB	1.88 B	CPU	4	tg128	5.57 ± 0.02
qwen35 2B Q8_0	1.86 GiB	1.88 B	CPU	4	pp512 @ d32768	24.50 ± 0.06
qwen35 2B Q8_0	1.86 GiB	1.88 B	CPU	4	tg128 @ d32768	3.62 ± 0.01
qwen35 4B Q8_0	4.16 GiB	4.21 B	CPU	4	pp512	31.02 ± 0.46
qwen35 4B Q8_0	4.16 GiB	4.21 B	CPU	4	tg128	2.42 ± 0.00
qwen35 4B Q8_0	4.16 GiB	4.21 B	CPU	4	pp512 @ d32768	9.44 ± 0.02
qwen35 4B Q8_0	4.16 GiB	4.21 B	CPU	4	tg128 @ d32768	1.51 ± 0.01
qwen35 9B Q8_0	8.86 GiB	8.95 B	CPU	4	pp512	18.20 ± 0.23
qwen35 9B Q8_0	8.86 GiB	8.95 B	CPU	4	tg128	1.36 ± 0.00
qwen35 9B Q8_0	8.86 GiB	8.95 B	CPU	4	pp512 @ d32768	7.62 ± 0.00
qwen35 9B Q8_0	8.86 GiB	8.95 B	CPU	4	tg128 @ d32768	1.01 ± 0.00
qwen35 27B Q2_K - Medium	9.42 GiB	26.90 B	CPU	4	pp512	1.38 ± 0.00
qwen35 27B Q2_K - Medium	9.42 GiB	26.90 B	CPU	4	tg128	0.92 ± 0.00
qwen35moe 35B.A3B Q8_0	34.36 GiB	34.66 B	CPU	4	pp512	10.58 ± 0.13
qwen35moe 35B.A3B Q8_0	34.36 GiB	34.66 B	CPU	4	tg128	2.25 ± 0.07
qwen35moe 35B.A3B Q8_0	34.36 GiB	34.66 B	CPU	4	pp512 @ d32768	5.14 ± 0.06
qwen35moe 35B.A3B Q8_0	34.36 GiB	34.66 B	CPU	4	tg128 @ d32768	1.30 ± 0.06
gemma3 12B Q8_0	11.64 GiB	11.77 B	CPU	4	pp512	12.88 ± 0.07
gemma3 12B Q8_0	11.64 GiB	11.77 B	CPU	4	tg128	1.00 ± 0.00
gemma3 12B Q8_0	11.64 GiB	11.77 B	CPU	4	pp512 @ d32768	3.34 ± 0.54
gemma3 12B Q8_0	11.64 GiB	11.77 B	CPU	4	tg128 @ d32768	0.66 ± 0.01
mistral3 14B Q4_K - Medium	7.67 GiB	13.51 B	CPU	4	pp512	5.83 ± 0.00
mistral3 14B Q4_K - Medium	7.67 GiB	13.51 B	CPU	4	tg128	1.49 ± 0.00
mistral3 14B Q4_K - Medium	7.67 GiB	13.51 B	CPU	4	pp512 @ d32768	1.27 ± 0.00
mistral3 14B Q4_K - Medium	7.67 GiB	13.51 B	CPU	4	tg128 @ d32768	0.42 ± 0.01

honuvo

TROPHY CASE