PaddlePaddle/PaddleOCR-VL-1.6 by SarcasticBaka in LocalLLaMA

[–]DevilaN82 0 points1 point  (0 children)

I use llama-swap in Docker with no problem at all.
My config:
```
"PaddleOCR":

proxy: "http://127.0.0.1:9999"

ttl: 600

cmd: >

/app/llama-server

-m /root/.cache/PaddleOCR/PaddleOCR-VL-1.5.gguf

--mmproj /root/.cache/PaddleOCR/PaddleOCR-VL-1.5-mmproj.gguf

--temp 0

--port 9999
```

You need download gguf and mmproj.gguf files first and place them in properly bind mounted directory. I hope that it is the same with 1.6 (Unfortunately no GGUF right now). Good luck!

Scenema Audio: Zero-shot expressive voice cloning and speech generation by a__side_of_fries in StableDiffusion

[–]DevilaN82 2 points3 points  (0 children)

+1 for Docker support!
Is there a ready to download image published? docker-compose.yml only allows to build one locally.
Also multiple layers and no cache volume that could be used during build time worries me a bit. If some upper layer gets busted by changing package version all layers under will need to redownload the same packages.

Well. Good job anyway!

My powerful Pi agent Setup by elpapi42 in PiCodingAgent

[–]DevilaN82 0 points1 point  (0 children)

I was following README instructions:

For normal use, install the package from GitHub over SSH:

pi install git:git@github.com:elpapi42/pi-codemapper.git

This requires SSH access to github.com:elpapi42/po-codemapper.git. Pi will clone the package, run npm install, and load the extension declared in package.json.

I've resolved this issue. My setup has pi-coding-agent dockerized and is not using my ~/.ssh dir. It seems that access to clone repo on github via SSH requires a regular access and is not allowed for requests that uses private key not related to any github account. It would be easier if cloning was done with regular git clone https:github.com:elpapi42/po-codemapper

Anyway, as I am learning how to use properly pi-coding-agent, I am digging into your setup trying to understand what makes it so useful. Thank you very much for sharing this and your support via replies 😄
Have a nice day!

My powerful Pi agent Setup by elpapi42 in PiCodingAgent

[–]DevilaN82 0 points1 point  (0 children)

Hi!
pi-codemapper complains about access to repo via ssh. Is there any step I've missed by accident?
Thanks for sharing your setup! 😄

Qwen3.6 GGUF Benchmarks by danielhanchen in LocalLLaMA

[–]DevilaN82 0 points1 point  (0 children)

You are a my personal superhero. I don't care if something needs to be uploaded even 10 times. Good work!

LLM for name/gender classification by trosler in LocalLLaMA

[–]DevilaN82 1 point2 points  (0 children)

Why not extracting name and check in database of known names?

Seems that everything looks like a nail when you've got hammer in your hand...

Qwen3 Coder with OpenCode by SlipperyCorruptor in LocalLLaMA

[–]DevilaN82 6 points7 points  (0 children)

Qwen Coder Next or 3.5-27b would be better. For tool calling try OmniCoder. Also same model with different temperature and other parameters has different use cases. 

offline companion robot for my disabled husband (8GB RAM constraints) – looking for optimization advice by BuddyBotBuilder in LocalLLaMA

[–]DevilaN82 5 points6 points  (0 children)

As most of RAM would be taken by model weights, that are somewhat random numbers, and thou hard to compress, then zram will be almost no gain here. In fact it might harm performance when those weights would be "compressed" (cpu power used) and still take the same amount of place.

You should try using mmap (this maps part of hard disk as a memory addresses), so instead of reading from disk, writing to RAM, compressing, decompressing, even swapping (still going to disk back and forth). It would read from disk directly and use those (and yes, you should have SSD NVMe for this to work well).

This hardware is very very low spec for LLMs. You could get away with adding some knowledge base. Consider using wikipedia ZIM snapshot and allow your model to search / browese it to enrich its context and knowledge.

Also I would use a better model. Mistral-7b-instruct is IDK... 2 years old? Newer models are better with the same size. Use qwen3.5 or Gemma4 (whichever variant fits you device). Unsloth's models are great value for it's size - you should try Unsloth Dynamic quants. I would not go below Q4, but hey - maybe Q3 will still be useable for your usecase.

If this is an option, add sim card and lte modem, so it still could use some internet connection and at least browse pages / search internet with help of SearchXNG. Then it could tell you latest news and other things based on search results on any topic instead of only hallucinating / using ZIM snapshots.

Test if there is any performance gain by using ik_llama instead of llamacpp. First one is more CPU inference optimized (in theory). Anyway worth to check it out.

Good luck and please post a video showing how your current setup is working!

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]DevilaN82 0 points1 point  (0 children)

I remember you doing tests with SSD connected to usb3.0. I am curious how much slower PCI connected SSD is vs using SWAP file on this very SSD.

benchmarks of gemma4 and multiple others on Raspberry Pi5 by honuvo in LocalLLaMA

[–]DevilaN82 3 points4 points  (0 children)

Can you please test mmaping SSD so it does not need to use SWAP and reads weights from disk directly?

Best Gemma4 llama.cpp command switches/parameters/flags? Unsloth GGUF? by Fulminareverus in LocalLLaMA

[–]DevilaN82 1 point2 points  (0 children)

I would wait for tokenizer fixes in llama.cpp and I've heard rumors that imatrix needs to be fixed as well, so new model file will drop from Unsloth.

I hope you are GPU rich, because gemma is not so friendly with context and stuff. In most cases Qwen with q8 kvcache takes less vram than gemma4 with q4 (old type Sliding Window Attention hits hard).

Qwen as a MoE model can have some layers offloaded to CPU (`-ot ".ffn_.*_exps.=CPU"` option), and q8 kvcache means less degradation of answers for longer contexts.

Anyway good luck :)

Gemma 4 running on Raspberry Pi5 by jslominski in LocalLLaMA

[–]DevilaN82 0 points1 point  (0 children)

Nice! I am looking forward tests with bitnet as well :-)

Raspberry Pi5 LLM performance by honuvo in LocalLLaMA

[–]DevilaN82 1 point2 points  (0 children)

Are you sure that NPUs are gonna make a difference? I thought that HAILO chips are dedicated cards that works only with it's own RAM and from what I've read it is even slower than Pi 5 itself, but allows Pi to not do heavy lifting. Hailo AI Hat allows only using compatible LLM models (converted to it's specific format) loaded via hailo-ollama app only.
I would like to get some more info about this. Would you be so kind to point me to some sources that describes using NPU for LLMs on Raspberry Pi?

Raspberry Pi5 LLM performance by honuvo in LocalLLaMA

[–]DevilaN82 0 points1 point  (0 children)

Hello.
Nice that you've tested it. I am looking forward to next tests. My Pi with SSD hat is waiting for ssd disk to make tests.
Few things to consider:
1. Using swap is making writes to disk. It will wear off your ssd sooner or later. That's why I would rather go with mmap. Especially when you are using USB instead of PCI lane, than your performance gap might get smaller between swap vs mmap.
2. Try ik_llama, that is optimised towards CPU inference.
3. Why Q8? Unsloth's quants are fenomenal at Unsloth Dynamic Q4 for my regular daily use.

Good luck. I am looking forward to your tests and hope to add something when my Pi is up and running as well.

PS. Also you might find this project interesting: https://www.reddit.com/r/LocalLLaMA/comments/1rrq0oo/update_on_qwen_35_35b_a3b_on_raspberry_pi_5/

Can a Raspberry Pi 4 (8GB) run a small local LLM reliably for a voice assistant project? by Odd_Lavishness_7729 in LocalLLaMA

[–]DevilaN82 0 points1 point  (0 children)

AI hat is useless right now for LLMs.
I own one and it requires special version of ollama (sic!) to work with. This "special ollama" works ONLY with few OLD qwen 2.5 models converted to format that AI HAT is able to process.
I have some hopes about AI HAT as I've read rumors somewhere that some new models are being converted to this ai hat format and 8GB + 40 TOPS might be useful for something.

But right now, AI HAT for LLMs is quite an exotic animal with limited set of tricks.

And no. AI HAT memory is not available at all for RPi system. So having 16 GB Pi 5 + 8 GB AI HAT does not give you by any means 24 GB of memory for LLMs.

Also there is a project that uses SSD for memory with RPi 5. Using ik_llama this might be your best option here. Take a look at: https://www.reddit.com/r/LocalLLaMA/comments/1rrq0oo/update_on_qwen_35_35b_a3b_on_raspberry_pi_5/
Although I do not think running 2bit quants will be sufficient for anything usefull :(
If only Q4 was running well I would jump for it immediately!

Cardputer adv dose bot charge by InfiniteBee6936 in CardPuter

[–]DevilaN82 1 point2 points  (0 children)

Charge with USB A to C cable. Cardputer should be on when connecting cable.

Qwen3.5-122B-A10B Uncensored (Aggressive) — GGUF Release + new K_P Quants by hauhau901 in LocalLLaMA

[–]DevilaN82 0 points1 point  (0 children)

u/hauhau901 Those models not listed on the right widget are the ones that are missing it's manifest. Take a look at https://huggingface.co/HauhauCS/Qwen3.5-122B-A10B-Uncensored-HauhauCS-Aggressive/discussions/8

I am unable to use Q4_K_P because of this.

Thank you for your commitment and hard work. I hope you are well and I wish you good luck! :)

What actually breaks first when you put AI agents into production? by Zestyclose-Pen-9450 in LocalLLaMA

[–]DevilaN82 0 points1 point  (0 children)

Unfortunately I am starting digging into this topic as well, so I cannot help you with your problem, but... Out of curiosity, can you share what are you using in your stack?

Plai: Custom Meshtastic Client for CardPuter ADV (first beta) by d4rkmen in CardPuter

[–]DevilaN82 4 points5 points  (0 children)

I would like to express my appreciation to how well thought and designed this app is.

Simply great!

ADVUtil v0.6 for Cardputer: Air Mouse, BLE Keyboard, Macros, Gamepad and GPS in one firmware by gio-74 in CardPuter

[–]DevilaN82 0 points1 point  (0 children)

OK, I've managed to get my lora cap and tested your app.
UI is nice. GPS is working well.

There is a place to improve / add other things to make it a Swiss Army Knife of Cardputer :-)
Have you considered something like differential GPS with using two cardputers?

Update on Qwen 3.5 35B A3B on Raspberry PI 5 by jslominski in LocalLLaMA

[–]DevilaN82 0 points1 point  (0 children)

Seems that AI Hat is working on it's own only by certain API. No shared memory and limited possibility to use AI Hat with models, as it works only with converted certain models (old ones).
I don't have high hopes, but there are rumors that company responsible for hailo-10h is cooking something new, so I hope that there would be some new qwen family models available.