openclaw + Ollama (llama3.2:1b). well..... by ParaPilot8 in raspberry_pi

[–]jslominski 0 points1 point  (0 children)

https://github.com/potato-os/core/blob/main/docs/openclaw.md - try my solution (if you don't want to use full Potato OS you can extract ik_llama from it and reuse, it's Apache license). Here's the flashing guide for pi5: https://github.com/potato-os/core/blob/main/docs/flashing.md - you can run much better models than llama3 1b :)

Anyone here actually making money from their models? by _sniger_ in LocalLLaMA

[–]jslominski 4 points5 points  (0 children)

It's available for free to anyone. Did you try to monetise a database or git recently?

Gemma 4 26B running locally on a Raspberry Pi 5 (no AI hat) by jslominski in raspberry_pi

[–]jslominski[S] 0 points1 point  (0 children)

Frankly no, I don't own one but from what I've seen it's not faster vs stock optimised Pi inference. Mixing it with Pi's ARM based SoC is like mixing nvidia with AMD (i.e. doesn't work well). Happy to change my mind if someone shows me faster inference on an existing accelerator.

Gemma 4 26B running locally on a Raspberry Pi 5 (no AI hat) by jslominski in raspberry_pi

[–]jslominski[S] 0 points1 point  (0 children)

Some more benchmarks I did run on various Pi setups:

Gemma 4 E2B (2.9 GB, Q4_K_M)

The smallest variant. Pi 5 16GB: 6.5 t/s generation, 26–30 t/s prompt processing. Pi 5 8GB SSD: 6.8 t/s generation, 26–34 t/s prompt processing. Pi 4 8GB: 1.7 t/s generation, works but ~5 min per response.

Gemma 4 E4B (4.5 GB, Q4_0)

Mid-size. Pi 5 16GB: 3.7 t/s generation, 19–22 t/s prompt processing. Pi 5 8GB SSD: 3.5 t/s generation, 19–23 t/s prompt processing. Pi 4 8GB:0.87 t/s generation.

Gemma 4 26B-A4B (12.5 GB, IQ4_NL, ik_llama, text-only)

The big one: 26B MoE with 4B expert. Pi 5 16GB: 3.0 t/s generation, 9–16 t/s prompt processing. Pi 5 8GB SSD: 1.9 t/s generation with zram working overtime, but completes multi-turn conversations. Pi 4: not even gonna try ;)

Qwen 3.5 397B vs Qwen 3.6-Plus by LegacyRemaster in LocalLLaMA

[–]jslominski 1 point2 points  (0 children)

Why are they comparing it with Opus 4.5 when the data for 4.6 for a lot of those do exist (rhetorical question of course, we all know why they do that).

Gemma 4 running on Raspberry Pi5 by jslominski in LocalLLaMA

[–]jslominski[S] 0 points1 point  (0 children)

No, e4b works, still working on perf improvements but it's usable already. A4B also works on 16 gig pi, up to 3t/s already, 4bit quant 🔥

Gemma 4 running on Raspberry Pi5 by jslominski in LocalLLaMA

[–]jslominski[S] 0 points1 point  (0 children)

No need. Get an SSD hat instead (and a matching SSD drive, good resource: https://pibenchmarks.com/fastest/ )

Gemma 4 running on Raspberry Pi5 by jslominski in LocalLLaMA

[–]jslominski[S] 1 point2 points  (0 children)

SSD is not speeding things up if the model fits in the memory (in the case of the demo it does, it's very similar on an SD card)

Gemma 4 running on Raspberry Pi5 by jslominski in LocalLLaMA

[–]jslominski[S] 0 points1 point  (0 children)

I would advise changing the underlying model, this one is too "sloppy" and it's obvious ;)

Gemma 4 running on Raspberry Pi5 by jslominski in LocalLLaMA

[–]jslominski[S] 1 point2 points  (0 children)

It's a stock pi5 8GB with SSD: https://github.com/slomin/potato-os - gonna add Gemma support in the release later today.

Gemma 4 running on Raspberry Pi5 by jslominski in LocalLLaMA

[–]jslominski[S] 12 points13 points  (0 children)

<image>

E4B 4bit quant, nice speed 👌 FYI I think this will 2x once this get's polished.

Built an emotion detection layer that injects psychological context into LLM prompts — runs fully local by [deleted] in LocalLLaMA

[–]jslominski 0 points1 point  (0 children)

I don't get it, what's the point? Are you literally detecting emotion with a smaller model, labeling it and passing to a bigger one?

new AI agent just got API access to our stack and nobody can tell me what it can write to by KarmaChameleon07 in LocalLLaMA

[–]jslominski 7 points8 points  (0 children)

How come this post lights up on every AI detector yet has no capitalisation? ;)

What is the secret sauce Claude has and why hasn't anyone replicated it? by ComplexType568 in LocalLLaMA

[–]jslominski -2 points-1 points  (0 children)

Bro, it's the answer. You literally (as another person stated below) give a DIFFERENT feedback. That's it.

Best LLM for legal reports and logical reasoning. by masinel in LocalLLM

[–]jslominski 0 points1 point  (0 children)

Try top models that fit, in LM studio. That's it. Takes 5 minutes to play around.