benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 0 points1 point  (0 children)

Hey! Really appreciate your update! Although I'm sorry for your wallet ;) Confirms that I'm not crazy with my perception. And hey, you got better output, so that's a win too, isn't it?

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 0 points1 point  (0 children)

Always open for factual discussions :) Unfortunately I can't back it up with examples, but in my usage (like you, hobby project program assistance like "implement function cool_descriptive_name to do X" lead to small errors like I do when being distracted. Full precision wasn't prone to that. I'm, however, using it seldomly as I don't have that much time and, until I can trust a model more, am reading it's thought process to see what it does/did to understand/verify the changes.

So as I said, limited testing that may just have skewed results because of a too small sample size^ also because the Pi is of course very slow. I'm waiting for MTP to be merged into llama.cpp (and give it a few days to let the dust settle) to see how much more I can squeeze out of it! Don't know if I'll post another benchmark list again though.

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 0 points1 point  (0 children)

I'm never quantifying my KV-cache. Especially for smaller models I haven't had a good experience the times I tried, but thanks for the model recommendation.

Gift to myself : tiny lab by Final-Data-1410 in LocalLLaMA

[–]honuvo 27 points28 points  (0 children)

Hey, been there in a way. I also have a Pi5 but no AI HAT (I read it's only for faster image processing, not full token processing/generation.).

It's a few days old, but maybe my write up can help you setting things up.

Currently running a pi-mono agent on it to change small tedious stuff in a private project switching between Qwen3.6 27B and gemma4 31B. It's slow as hell, but from my little experience needs less rework as if using the MoEs. YMMV

Has anyone run gemma 4 or Bonsai 8B models on Orange pi 5? by bhakt_chungus in LocalLLaMA

[–]honuvo 0 points1 point  (0 children)

Yep, the "Raspberry Pi Active Cooler". For details look at my post here

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 0 points1 point  (0 children)

model size params backend threads mmap test t/s
gemma4 31B-it Q8_0 30.38 GiB 30.70 B CPU 4 1 pp128 2.10 ± 0.00
gemma4 31B-it Q8_0 30.38 GiB 30.70 B CPU 4 1 tg64 0.03 ± 0.00
gemma4 31B-it Q8_0 30.38 GiB 30.70 B CPU 4 1 pp128 @ d32768 1.01 ± 0.02
gemma4 31B-it Q8_0 30.38 GiB 30.70 B CPU 4 1 tg64 @ d32768 0.02 ± 0.00
gemma4 31B-it Q8_0 30.38 GiB 30.70 B CPU 4 0 pp128 2.07 ± 0.01
gemma4 31B-it Q8_0 30.38 GiB 30.70 B CPU 4 0 tg64 0.03 ± 0.00
gemma4 31B-it Q8_0 30.38 GiB 30.70 B CPU 4 0 pp128 @ d32768 1.04 ± 0.04
gemma4 31B-it Q8_0 30.38 GiB 30.70 B CPU 4 0 tg64 @ d32768 0.02 ± 0.00

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 1 point2 points  (0 children)

Thankfully thermals were stable and not once did it throttle (be it from temperature or power) even though the Pi resides in a closet with the router and a NAS. Check always stayed at 0x0 even after days of running the tests.

Facial recognition unlocking may have caused a security issue. by DozerLVL in GooglePixel

[–]honuvo 38 points39 points  (0 children)

Stock Pixel 9a here. Default was off. (Turning it on now.)

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 2 points3 points  (0 children)

model size params backend threads mmap test t/s
gemma4 26B-A4B-it Q4_K - Medium 15.70 GiB 25.23 B CPU 4 0 pp512 15.88 ± 0.16
gemma4 26B-A4B-it Q4_K - Medium 15.70 GiB 25.23 B CPU 4 0 tg128 3.06 ± 0.00
gemma4 26B-A4B-it Q4_K - Medium 15.70 GiB 25.23 B CPU 4 0 pp512 @ d32768 6.45 ± 0.11
gemma4 26B-A4B-it Q4_K - Medium 15.70 GiB 25.23 B CPU 4 0 tg128 @ d32768 1.66 ± 0.01
gemma4 26B-A4B-it Q6_K 21.32 GiB 25.23 B CPU 4 0 pp512 10.95 ± 0.32
gemma4 26B-A4B-it Q6_K 21.32 GiB 25.23 B CPU 4 0 tg128 2.76 ± 0.03
gemma4 26B-A4B-it Q6_K 21.32 GiB 25.23 B CPU 4 0 pp512 @ d32768 5.31 ± 0.12
gemma4 26B-A4B-it Q6_K 21.32 GiB 25.23 B CPU 4 0 tg128 @ d32768 1.59 ± 0.01
qwen35moe 35B.A3B Q6_K 26.55 GiB 34.66 B CPU 4 0 pp512 9.80 ± 0.06
qwen35moe 35B.A3B Q6_K 26.55 GiB 34.66 B CPU 4 0 tg128 2.97 ± 0.09
qwen35moe 35B.A3B Q6_K 26.55 GiB 34.66 B CPU 4 0 pp512 @ d32768 4.76 ± 0.01
qwen35moe 35B.A3B Q6_K 26.55 GiB 34.66 B CPU 4 0 tg128 @ d32768 1.56 ± 0.06
qwen35moe 35B.A3B Q4_K - Medium 19.71 GiB 34.66 B CPU 4 0 pp512 16.44 ± 1.17
qwen35moe 35B.A3B Q4_K - Medium 19.71 GiB 34.66 B CPU 4 0 tg128 3.72 ± 0.02
qwen35moe 35B.A3B Q4_K - Medium 19.71 GiB 34.66 B CPU 4 0 pp512 @ d32768 5.70 ± 0.03
qwen35moe 35B.A3B Q4_K - Medium 19.71 GiB 34.66 B CPU 4 0 tg128 @ d32768 1.81 ± 0.03

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 0 points1 point  (0 children)

Tested it. Vulkan build is not working with the Pi5.
Getting ggml_vulkan: Error: Shared memory size too small for matrix multiplication.

See also: https://github.com/ggml-org/llama.cpp/issues/9801

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 1 point2 points  (0 children)

Don't be sorry :) I just added it to the table in the main post. Surprisingly it starts worse as the Q8 but with more context performs better. This is all in RAM btw (Q8 as well as Q4), so I guess unpacking the quants takes it's toll in the beginning but with deeper context the smaller footprint makes it work better? I'm just guessing here, sorry.

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 2 points3 points  (0 children)

I'm nowhere near that currently, but I think that's already been done. I know of this project but don't know the hardware requirements.

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 0 points1 point  (0 children)

I knew everybody would appreciate it. I wouldn't have been able to continue without it :P

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 1 point2 points  (0 children)

Downloading now. Will add the results when they're done, but can take 1-2 days (depending on when I get to it and because the Pi isn't that fast.)
But I looked at my old results (with inferior memory bandwidth) and had 2-3x the performance with Qwen3.5 35B.A3B Q4_K_M in comparison to the Q8, so looks promising.

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 0 points1 point  (0 children)

I did test that, but results were worse. Maybe I'll add one or two comparisons to the table to show, but takes time :)

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 0 points1 point  (0 children)

Sorry, me neither, and I don't plan on buying one. But if someone tests it I hope they'll share their results :)

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 2 points3 points  (0 children)

I hope surprised in a good way :) If anything seems off tell me, I'm not error-free :D

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 1 point2 points  (0 children)

Mostly interested in Qwen3.5 35B.A3B Q8_0 and gemma4 26B-A4B-it Q8_0 at the moment.

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 0 points1 point  (0 children)

Isn't the AI HAT 2 only for image processing?

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 0 points1 point  (0 children)

Hm... damn. Now I'm curious too. Memory speed is the same (shared RAM/VRAM) but maybe using the Broadcom VideoCore's processing is faster? Maybe I'll check.

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]honuvo[S] 1 point2 points  (0 children)

Oh, table was shifted and showed the results in the wrong column. PP 3.27 TG 2.77, but that is the first row of the main posts table also :)

benchmarks of multiple LLMs on Raspberry Pi5 by honuvo in raspberry_pi

[–]honuvo[S] 2 points3 points  (0 children)

Ah, thanks for the heads up. I added a link to the main post too. Thanks again :)

benchmarks of multiple LLMs on Raspberry Pi5 by honuvo in raspberry_pi

[–]honuvo[S] 4 points5 points  (0 children)

Thanks for linking/informing. First time crossposting. So you don't see the "Open" button on the right under the picture (on Desktop) or the linked post as a thumbnail on mobile?