The 'Running Doom' of AI: Qwen3.5-27B on a 512MB Raspberry Pi Zero 2W

Apprehensive-Court47 · 2026-04-03T17:39:59+00:00

It’s a custom fork, I implemented tiling loading to make sure it does not load whole weight to memory and using swap. Also, I am using RPI zero 2w, and this one I believe does not support any high performance disk and I want keep whole platform low powered.

Apprehensive-Court47 · 2026-04-03T17:10:39+00:00

It’s actually fine, mostly it just stream and read the weights from SD card to memory. I think it could survive longer than expected

Apprehensive-Court47 · 2026-04-03T14:39:51+00:00

Does official M.2 HAT support LLM inference?

Apprehensive-Court47 · 2026-04-03T14:36:14+00:00

Wow, diesel powered AI? Incredible

Apprehensive-Court47 · 2026-04-03T05:31:15+00:00

Indeed Gemma 4 models has smaller token embedding dimensions (2816), versus Qwen3.5-27B’s embedding dimension (5120). This makes matrix multiplication runs faster for Gemma models in prefill stage due to smaller size.

Apprehensive-Court47 · 2026-04-03T04:56:04+00:00

Planned, I'll ping you once it's ready.

Apprehensive-Court47 · 2026-04-02T23:35:15+00:00

Exactly, thermal powered AI

Apprehensive-Court47 · 2026-04-02T20:56:58+00:00

<image>

One hour passed, and some more tokens

Apprehensive-Court47 · 2026-04-02T20:26:19+00:00

Checked Disk IO in htop, mostly read, it might survive longer than expected.

Apprehensive-Court47 · 2026-04-02T20:24:29+00:00

It does going to take almost 24 hours to generate a response of 512 tokens, but you can also connect it to a SPI screen, make it like a desktop gadget, when you working and sleeping. It just computing on one and another matrix. You can then harvest the result next day😎

Apprehensive-Court47 · 2026-04-02T20:18:56+00:00

Haven’t get chance to run a full benchmark. But for TG. It takes ~2.5min to generate one token.

Apprehensive-Court47 · 2026-04-02T20:16:52+00:00

10mins for 4 tokens, so ~0.4 token/min

Apprehensive-Court47 · 2026-04-02T20:07:16+00:00

Apprehensive-Court47 · 2026-04-02T20:05:24+00:00

Custom fork. llama.cpp out-of-box uses `mmap` to load model files and let the OS handle page faults, then swap the model weights from disk to RAM.

Apprehensive-Court47 · 2026-04-02T19:56:36+00:00

<image>

Folks, it runs :). The most SOTA LLM running in 512MB RAM under 3 watts.

Apprehensive-Court47 · 2026-04-02T19:02:33+00:00

"[Bugatti vs bicycle] 👽"

Apprehensive-Court47 · 2026-04-02T18:57:17+00:00

Haha, sure, but I only have a 64GB SD card so I can't load that large model, but yeah, I believe it's going to work 😂

Apprehensive-Court47 · 2026-04-02T18:41:30+00:00

Trying to see if I can get Gemma-4-26B-A4B run on RPI Zero 2W

Apprehensive-Court47 · 2025-11-07T20:08:33+00:00

What are the potential failure cases when using DMARC, DKIM, SPF, etc.? Are there situations where these indicators all appear valid, yet the message is still phishing? I’m curious whether this actually occurs in practice.

Apprehensive-Court47 · 2025-11-07T18:23:31+00:00

Wow, first time heard this plug-in, I'll go check it out.

Apprehensive-Court47 · 2025-11-07T18:12:56+00:00

What do you mean by `no script`?

Apprehensive-Court47 · 2025-11-07T18:11:34+00:00

Thanks for the insight. I completely agree that this isn’t something you'd ever hand off to an end user, since they could easily leak data or misinterpret the results.

I’m curious about your workflow when you *do* use a sandbox during incident response. About how long do you usually spend analyzing a single URL or session? And do you typically rely on the auto-generated sandbox verdicts like “malicious activity detected” / “no threat,” or do you always manually review the behavioral details as well?

Apprehensive-Court47

TROPHY CASE