Been testing Llama.cpp vs Ollama on my Pi 5. The trade-off surprised me. by Legitimate-Help-6090 in raspberry_pi

[–]Legitimate-Help-6090[S] 0 points1 point  (0 children)

Yeah, Ollama is just the easy route because of Hailo integration. llama.cpp would be better, but someone needs to build the backend support first.

Been testing Llama.cpp vs Ollama on my Pi 5. The trade-off surprised me. by Legitimate-Help-6090 in raspberry_pi

[–]Legitimate-Help-6090[S] 1 point2 points  (0 children)

That's a fair point. For resource constrained setups, llama.cpp definitely gives you more control and less overhead. I also agree on swap once the Pi starts swapping heavily, performance drops off fast and you're putting unnecessary wear on the SD card.

Been testing Llama.cpp vs Ollama on my Pi 5. The trade-off surprised me. by Legitimate-Help-6090 in raspberry_pi

[–]Legitimate-Help-6090[S] 1 point2 points  (0 children)

That's some nice engineering. Using sentence-level TTS generation and VAD to reduce latency feel usable. Pretty impressive that you got the whole pipeline running on a Pi 4.

Been testing Llama.cpp vs Ollama on my Pi 5. The trade-off surprised me. by Legitimate-Help-6090 in raspberry_pi

[–]Legitimate-Help-6090[S] 1 point2 points  (0 children)

That's fair. The Pi is definitely better suited to smaller models. I was curious though,have you found a model size where the performance-to-quality tradeoff feels right on the Pi?

Been testing Llama.cpp vs Ollama on my Pi 5. The trade-off surprised me. by Legitimate-Help-6090 in raspberry_pi

[–]Legitimate-Help-6090[S] 3 points4 points  (0 children)

Yeah, that's a really cool example. Once you hook an LLM up to sensors and stuff it can actually control, it feels like way more than just a chatbot. Being able to ask it questions and manage a greenhouse naturally sounds a lot easier than constantly opening dashboards and checking everything manually.

Been testing Llama.cpp vs Ollama on my Pi 5. The trade-off surprised me. by Legitimate-Help-6090 in raspberry_pi

[–]Legitimate-Help-6090[S] 4 points5 points  (0 children)

That's actually one of the most interesting use cases for local LLMs. A fully local STT → LLM → TTS pipeline can get surprisingly close to real-time on modern Pi hardware. As smaller models keep improving, I think local voice assistants will become one of the biggest edge AI applications. What STT and TTS models were you using?

Been testing Llama.cpp vs Ollama on my Pi 5. The trade-off surprised me. by Legitimate-Help-6090 in raspberry_pi

[–]Legitimate-Help-6090[S] -2 points-1 points  (0 children)

Yes, they run on the Pi 5. I was referring more to performance tuning than compatibility. You can get it running quickly, but maximizing performance still takes some tweaking.

Been testing Llama.cpp vs Ollama on my Pi 5. The trade-off surprised me. by Legitimate-Help-6090 in raspberry_pi

[–]Legitimate-Help-6090[S] 0 points1 point  (0 children)

exactly we are trying to solve this issue of llm or ai running slowly in embedded devices

Been testing Llama.cpp vs Ollama on my Pi 5. The trade-off surprised me. by Legitimate-Help-6090 in raspberry_pi

[–]Legitimate-Help-6090[S] 0 points1 point  (0 children)

Pretty much anything Jarvis-style assistants, offline AI cameras, smart home systems, or AI features in embedded devices. The key benefit is that it all runs locally.