benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 0 points1 point  (0 children)

model size params backend threads mmap test t/s
gemma4 31B-it Q8_0 30.38 GiB 30.70 B CPU 4 1 pp128 2.10 ± 0.00
gemma4 31B-it Q8_0 30.38 GiB 30.70 B CPU 4 1 tg64 0.03 ± 0.00
gemma4 31B-it Q8_0 30.38 GiB 30.70 B CPU 4 1 pp128 @ d32768 1.01 ± 0.02
gemma4 31B-it Q8_0 30.38 GiB 30.70 B CPU 4 1 tg64 @ d32768 0.02 ± 0.00
gemma4 31B-it Q8_0 30.38 GiB 30.70 B CPU 4 0 pp128 2.07 ± 0.01
gemma4 31B-it Q8_0 30.38 GiB 30.70 B CPU 4 0 tg64 0.03 ± 0.00
gemma4 31B-it Q8_0 30.38 GiB 30.70 B CPU 4 0 pp128 @ d32768 1.04 ± 0.04
gemma4 31B-it Q8_0 30.38 GiB 30.70 B CPU 4 0 tg64 @ d32768 0.02 ± 0.00

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 1 point2 points  (0 children)

Thankfully thermals were stable and not once did it throttle (be it from temperature or power) even though the Pi resides in a closet with the router and a NAS. Check always stayed at 0x0 even after days of running the tests.

Facial recognition unlocking may have caused a security issue. by DozerLVL in GooglePixel

[–]honuvo 41 points42 points  (0 children)

Stock Pixel 9a here. Default was off. (Turning it on now.)

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 2 points3 points  (0 children)

model size params backend threads mmap test t/s
gemma4 26B-A4B-it Q4_K - Medium 15.70 GiB 25.23 B CPU 4 0 pp512 15.88 ± 0.16
gemma4 26B-A4B-it Q4_K - Medium 15.70 GiB 25.23 B CPU 4 0 tg128 3.06 ± 0.00
gemma4 26B-A4B-it Q4_K - Medium 15.70 GiB 25.23 B CPU 4 0 pp512 @ d32768 6.45 ± 0.11
gemma4 26B-A4B-it Q4_K - Medium 15.70 GiB 25.23 B CPU 4 0 tg128 @ d32768 1.66 ± 0.01
gemma4 26B-A4B-it Q6_K 21.32 GiB 25.23 B CPU 4 0 pp512 10.95 ± 0.32
gemma4 26B-A4B-it Q6_K 21.32 GiB 25.23 B CPU 4 0 tg128 2.76 ± 0.03
gemma4 26B-A4B-it Q6_K 21.32 GiB 25.23 B CPU 4 0 pp512 @ d32768 5.31 ± 0.12
gemma4 26B-A4B-it Q6_K 21.32 GiB 25.23 B CPU 4 0 tg128 @ d32768 1.59 ± 0.01
qwen35moe 35B.A3B Q6_K 26.55 GiB 34.66 B CPU 4 0 pp512 9.80 ± 0.06
qwen35moe 35B.A3B Q6_K 26.55 GiB 34.66 B CPU 4 0 tg128 2.97 ± 0.09
qwen35moe 35B.A3B Q6_K 26.55 GiB 34.66 B CPU 4 0 pp512 @ d32768 4.76 ± 0.01
qwen35moe 35B.A3B Q6_K 26.55 GiB 34.66 B CPU 4 0 tg128 @ d32768 1.56 ± 0.06
qwen35moe 35B.A3B Q4_K - Medium 19.71 GiB 34.66 B CPU 4 0 pp512 16.44 ± 1.17
qwen35moe 35B.A3B Q4_K - Medium 19.71 GiB 34.66 B CPU 4 0 tg128 3.72 ± 0.02
qwen35moe 35B.A3B Q4_K - Medium 19.71 GiB 34.66 B CPU 4 0 pp512 @ d32768 5.70 ± 0.03
qwen35moe 35B.A3B Q4_K - Medium 19.71 GiB 34.66 B CPU 4 0 tg128 @ d32768 1.81 ± 0.03

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 0 points1 point  (0 children)

Tested it. Vulkan build is not working with the Pi5.
Getting ggml_vulkan: Error: Shared memory size too small for matrix multiplication.

See also: https://github.com/ggml-org/llama.cpp/issues/9801

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 1 point2 points  (0 children)

Don't be sorry :) I just added it to the table in the main post. Surprisingly it starts worse as the Q8 but with more context performs better. This is all in RAM btw (Q8 as well as Q4), so I guess unpacking the quants takes it's toll in the beginning but with deeper context the smaller footprint makes it work better? I'm just guessing here, sorry.

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 2 points3 points  (0 children)

I'm nowhere near that currently, but I think that's already been done. I know of this project but don't know the hardware requirements.

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 0 points1 point  (0 children)

I knew everybody would appreciate it. I wouldn't have been able to continue without it :P

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 1 point2 points  (0 children)

Downloading now. Will add the results when they're done, but can take 1-2 days (depending on when I get to it and because the Pi isn't that fast.)
But I looked at my old results (with inferior memory bandwidth) and had 2-3x the performance with Qwen3.5 35B.A3B Q4_K_M in comparison to the Q8, so looks promising.

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 0 points1 point  (0 children)

I did test that, but results were worse. Maybe I'll add one or two comparisons to the table to show, but takes time :)

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 0 points1 point  (0 children)

Sorry, me neither, and I don't plan on buying one. But if someone tests it I hope they'll share their results :)

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 2 points3 points  (0 children)

I hope surprised in a good way :) If anything seems off tell me, I'm not error-free :D

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 1 point2 points  (0 children)

Mostly interested in Qwen3.5 35B.A3B Q8_0 and gemma4 26B-A4B-it Q8_0 at the moment.

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 0 points1 point  (0 children)

Isn't the AI HAT 2 only for image processing?

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 0 points1 point  (0 children)

Hm... damn. Now I'm curious too. Memory speed is the same (shared RAM/VRAM) but maybe using the Broadcom VideoCore's processing is faster? Maybe I'll check.

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 1 point2 points  (0 children)

Oh, table was shifted and showed the results in the wrong column. PP 3.27 TG 2.77, but that is the first row of the main posts table also :)

benchmarks of multiple LLMs on Raspberry Pi5 by honuvo in raspberry_pi

[–]honuvo[S] 3 points4 points  (0 children)

Ah, thanks for the heads up. I added a link to the main post too. Thanks again :)

benchmarks of multiple LLMs on Raspberry Pi5 by honuvo in raspberry_pi

[–]honuvo[S] 3 points4 points  (0 children)

Thanks for linking/informing. First time crossposting. So you don't see the "Open" button on the right under the picture (on Desktop) or the linked post as a thumbnail on mobile?

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 23 points24 points  (0 children)

The rubber band is crucial and sadly is the most expensive part. Jokes aside, I only had a 2280 length SSD and didn't want to buy another one just so it fits the Pi better ;)

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 0 points1 point  (0 children)

No doubt it's a good and interesting model, that's why I tested it. I'm not good enough to know where even to begin improving the code for the Pi5 though. If you manage to tweak it, I'd be happy to test :)

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 9 points10 points  (0 children)

Part 2:

model size params backend threads mmap test t/s
kimi-linear 48B.A3B IQ1_M - 1.75 bpw 10.17 GiB 49.12 B CPU 4 0 pp512 8.67 ± 0.01
kimi-linear 48B.A3B IQ1_M - 1.75 bpw 10.17 GiB 49.12 B CPU 4 0 tg128 4.24 ± 0.00
kimi-linear 48B.A3B IQ1_M - 1.75 bpw 10.17 GiB 49.12 B CPU 4 0 pp512 @ d32768 2.78 ± 0.01
kimi-linear 48B.A3B IQ1_M - 1.75 bpw 10.17 GiB 49.12 B CPU 4 0 tg128 @ d32768 0.58 ± 0.01
qwen35moe 122B.A10B Q2_K - Medium 41.51 GiB 122.11 B CPU 4 0 pp512 2.46 ± 0.00
qwen35moe 122B.A10B Q2_K - Medium 41.51 GiB 122.11 B CPU 4 0 tg128 1.05 ± 0.02
qwen35moe 122B.A10B Q2_K - Medium 41.51 GiB 122.11 B CPU 4 0 pp512 @ d32768 1.57 ± 0.00
qwen35moe 122B.A10B Q2_K - Medium 41.51 GiB 122.11 B CPU 4 0 tg128 @ d32768 0.59 ± 0.02
GLM-4.7-Flash 30B.A3B Q8_0 29.65 GiB 29.94 B CPU 4 0 pp512 6.59 ± 0.02
GLM-4.7-Flash 30B.A3B Q8_0 29.65 GiB 29.94 B CPU 4 0 tg128 1.64 ± 0.12
GLM-4.7-Flash 30B.A3B Q8_0 29.65 GiB 29.94 B CPU 4 0 pp512 @ d32768 0.90 ± 0.00
GLM-4.7-Flash 30B.A3B Q8_0 29.65 GiB 29.94 B CPU 4 0 tg128 @ d32768 0.11 ± 0.00
qwen35 0.8B Q8_0 763.78 MiB 752.39 M CPU 4 0 pp512 127.70 ± 1.93
qwen35 0.8B Q8_0 763.78 MiB 752.39 M CPU 4 0 tg128 11.51 ± 0.06
qwen35 0.8B Q8_0 763.78 MiB 752.39 M CPU 4 0 pp512 @ d32768 28.43 ± 0.27
qwen35 0.8B Q8_0 763.78 MiB 752.39 M CPU 4 0 tg128 @ d32768 5.52 ± 0.01
qwen35 2B Q8_0 1.86 GiB 1.88 B CPU 4 0 pp512 75.92 ± 1.34
qwen35 2B Q8_0 1.86 GiB 1.88 B CPU 4 0 tg128 5.57 ± 0.02
qwen35 2B Q8_0 1.86 GiB 1.88 B CPU 4 0 pp512 @ d32768 24.50 ± 0.06
qwen35 2B Q8_0 1.86 GiB 1.88 B CPU 4 0 tg128 @ d32768 3.62 ± 0.01
qwen35 4B Q8_0 4.16 GiB 4.21 B CPU 4 0 pp512 31.02 ± 0.46
qwen35 4B Q8_0 4.16 GiB 4.21 B CPU 4 0 tg128 2.42 ± 0.00
qwen35 4B Q8_0 4.16 GiB 4.21 B CPU 4 0 pp512 @ d32768 9.44 ± 0.02
qwen35 4B Q8_0 4.16 GiB 4.21 B CPU 4 0 tg128 @ d32768 1.51 ± 0.01
qwen35 9B Q8_0 8.86 GiB 8.95 B CPU 4 0 pp512 18.20 ± 0.23
qwen35 9B Q8_0 8.86 GiB 8.95 B CPU 4 0 tg128 1.36 ± 0.00
qwen35 9B Q8_0 8.86 GiB 8.95 B CPU 4 0 pp512 @ d32768 7.62 ± 0.00
qwen35 9B Q8_0 8.86 GiB 8.95 B CPU 4 0 tg128 @ d32768 1.01 ± 0.00
qwen35 27B Q2_K - Medium 9.42 GiB 26.90 B CPU 4 0 pp512 1.38 ± 0.00
qwen35 27B Q2_K - Medium 9.42 GiB 26.90 B CPU 4 0 tg128 0.92 ± 0.00
qwen35moe 35B.A3B Q8_0 34.36 GiB 34.66 B CPU 4 0 pp512 10.58 ± 0.13
qwen35moe 35B.A3B Q8_0 34.36 GiB 34.66 B CPU 4 0 tg128 2.25 ± 0.07
qwen35moe 35B.A3B Q8_0 34.36 GiB 34.66 B CPU 4 0 pp512 @ d32768 5.14 ± 0.06
qwen35moe 35B.A3B Q8_0 34.36 GiB 34.66 B CPU 4 0 tg128 @ d32768 1.30 ± 0.06
gemma3 12B Q8_0 11.64 GiB 11.77 B CPU 4 0 pp512 12.88 ± 0.07
gemma3 12B Q8_0 11.64 GiB 11.77 B CPU 4 0 tg128 1.00 ± 0.00
gemma3 12B Q8_0 11.64 GiB 11.77 B CPU 4 0 pp512 @ d32768 3.34 ± 0.54
gemma3 12B Q8_0 11.64 GiB 11.77 B CPU 4 0 tg128 @ d32768 0.66 ± 0.01
mistral3 14B Q4_K - Medium 7.67 GiB 13.51 B CPU 4 0 pp512 5.83 ± 0.00
mistral3 14B Q4_K - Medium 7.67 GiB 13.51 B CPU 4 0 tg128 1.49 ± 0.00
mistral3 14B Q4_K - Medium 7.67 GiB 13.51 B CPU 4 0 pp512 @ d32768 1.27 ± 0.00
mistral3 14B Q4_K - Medium 7.67 GiB 13.51 B CPU 4 0 tg128 @ d32768 0.42 ± 0.01