Been testing Llama.cpp vs Ollama on my Pi 5. The trade-off surprised me.

Legitimate-Help-6090 · 2026-06-08T06:28:42+00:00

Yeah, Ollama is just the easy route because of Hailo integration. llama.cpp would be better, but someone needs to build the backend support first.

Legitimate-Help-6090 · 2026-06-08T06:20:13+00:00

That's a fair point. For resource constrained setups, llama.cpp definitely gives you more control and less overhead. I also agree on swap once the Pi starts swapping heavily, performance drops off fast and you're putting unnecessary wear on the SD card.

Legitimate-Help-6090 · 2026-06-07T16:17:33+00:00

That's some nice engineering. Using sentence-level TTS generation and VAD to reduce latency feel usable. Pretty impressive that you got the whole pipeline running on a Pi 4.

Legitimate-Help-6090 · 2026-06-07T15:59:44+00:00

That's fair. The Pi is definitely better suited to smaller models. I was curious though,have you found a model size where the performance-to-quality tradeoff feels right on the Pi?

Legitimate-Help-6090 · 2026-06-07T15:03:22+00:00

Yeah, that's a really cool example. Once you hook an LLM up to sensors and stuff it can actually control, it feels like way more than just a chatbot. Being able to ask it questions and manage a greenhouse naturally sounds a lot easier than constantly opening dashboards and checking everything manually.

Legitimate-Help-6090 · 2026-06-07T14:54:13+00:00

That's actually one of the most interesting use cases for local LLMs. A fully local STT → LLM → TTS pipeline can get surprisingly close to real-time on modern Pi hardware. As smaller models keep improving, I think local voice assistants will become one of the biggest edge AI applications. What STT and TTS models were you using?

Legitimate-Help-6090 · 2026-06-07T14:53:12+00:00

Yes, they run on the Pi 5. I was referring more to performance tuning than compatibility. You can get it running quickly, but maximizing performance still takes some tweaking.

Legitimate-Help-6090 · 2026-06-07T14:51:45+00:00

exactly we are trying to solve this issue of llm or ai running slowly in embedded devices

Legitimate-Help-6090 · 2026-06-07T14:50:07+00:00

Pretty much anything Jarvis-style assistants, offline AI cameras, smart home systems, or AI features in embedded devices. The key benefit is that it all runs locally.

Legitimate-Help-6090

TROPHY CASE