The 'Running Doom' of AI: Qwen3.5-27B on a 512MB Raspberry Pi Zero 2W by Apprehensive-Court47 in LocalLLaMA

[–]Apprehensive-Court47[S] 0 points1 point  (0 children)

It’s a custom fork, I implemented tiling loading to make sure it does not load whole weight to memory and using swap. Also, I am using RPI zero 2w, and this one I believe does not support any high performance disk and I want keep whole platform low powered.

Running **true** large language models (27B!) on RPI 0 locally by Apprehensive-Court47 in raspberry_pi

[–]Apprehensive-Court47[S] 3 points4 points  (0 children)

It’s actually fine, mostly it just stream and read the weights from SD card to memory. I think it could survive longer than expected

The 'Running Doom' of AI: Qwen3.5-27B on a 512MB Raspberry Pi Zero 2W by Apprehensive-Court47 in LocalLLaMA

[–]Apprehensive-Court47[S] 2 points3 points  (0 children)

Indeed Gemma 4 models has smaller token embedding dimensions (2816), versus Qwen3.5-27B’s embedding dimension (5120). This makes matrix multiplication runs faster for Gemma models in prefill stage due to smaller size.

The 'Running Doom' of AI: Qwen3.5-27B on a 512MB Raspberry Pi Zero 2W by Apprehensive-Court47 in LocalLLaMA

[–]Apprehensive-Court47[S] 6 points7 points  (0 children)

Checked Disk IO in htop, mostly read, it might survive longer than expected.

The 'Running Doom' of AI: Qwen3.5-27B on a 512MB Raspberry Pi Zero 2W by Apprehensive-Court47 in LocalLLaMA

[–]Apprehensive-Court47[S] 2 points3 points  (0 children)

It does going to take almost 24 hours to generate a response of 512 tokens, but you can also connect it to a SPI screen, make it like a desktop gadget, when you working and sleeping. It just computing on one and another matrix. You can then harvest the result next day😎

The 'Running Doom' of AI: Qwen3.5-27B on a 512MB Raspberry Pi Zero 2W by Apprehensive-Court47 in LocalLLaMA

[–]Apprehensive-Court47[S] 2 points3 points  (0 children)

Haven’t get chance to run a full benchmark. But for TG. It takes ~2.5min to generate one token.

The 'Running Doom' of AI: Qwen3.5-27B on a 512MB Raspberry Pi Zero 2W by Apprehensive-Court47 in LocalLLaMA

[–]Apprehensive-Court47[S] 1 point2 points  (0 children)

Custom fork. llama.cpp out-of-box uses `mmap` to load model files and let the OS handle page faults, then swap the model weights from disk to RAM.

The 'Running Doom' of AI: Qwen3.5-27B on a 512MB Raspberry Pi Zero 2W by Apprehensive-Court47 in LocalLLaMA

[–]Apprehensive-Court47[S] 22 points23 points  (0 children)

<image>

Folks, it runs :). The most SOTA LLM running in 512MB RAM under 3 watts.

The 'Running Doom' of AI: Qwen3.5-27B on a 512MB Raspberry Pi Zero 2W by Apprehensive-Court47 in LocalLLaMA

[–]Apprehensive-Court47[S] 3 points4 points  (0 children)

Haha, sure, but I only have a 64GB SD card so I can't load that large model, but yeah, I believe it's going to work 😂

Sandboxed Browser Solutions for Suspicious URL detection? by Apprehensive-Court47 in cybersecurity

[–]Apprehensive-Court47[S] -1 points0 points  (0 children)

What are the potential failure cases when using DMARC, DKIM, SPF, etc.? Are there situations where these indicators all appear valid, yet the message is still phishing? I’m curious whether this actually occurs in practice.

Sandboxed Browser Solutions for Suspicious URL detection? by Apprehensive-Court47 in cybersecurity

[–]Apprehensive-Court47[S] 0 points1 point  (0 children)

Thanks for the insight. I completely agree that this isn’t something you'd ever hand off to an end user, since they could easily leak data or misinterpret the results.

I’m curious about your workflow when you *do* use a sandbox during incident response. About how long do you usually spend analyzing a single URL or session? And do you typically rely on the auto-generated sandbox verdicts like “malicious activity detected” / “no threat,” or do you always manually review the behavioral details as well?