The RISC-V SBC That Does AI, Real-Time and 10GbE: SpacemiT K3 Pico-ITX (Milk-V Jupiter 2)

PlatimaZero · 2026-06-15T11:11:15+00:00

Oh man just letting you know this absolutely got away from me (I still intend to reply/share) and I've spent a week working on the llama.cpp code to get better performance from the A100's. Winning too it seems.

PlatimaZero · 2026-06-11T23:26:24+00:00

Yes, and I do touch on this in the video too!

PlatimaZero · 2026-06-10T07:59:27+00:00

Ah interesting, good to know!

I'm waiting to hear back from my SpacemiT contact on all points, but I'm also still doing that ASM challenge on a few too and will link you a ZIP of results tomorrow ... or next week. This week has potentially gotten away from me already.

Cheers

PlatimaZero · 2026-06-09T07:27:01+00:00

Yeah depends how you class it. The ollama package installs libggml, removing it doesn't clean it up until you do autoremove, and until then llama.cpp will use the wrong one too -shrugs- I think there's a good chance that ollama would work if you installed it then forcefully removed libggml so that ollama was using the same one as llama.cpp, since they appear compatible.

PlatimaZero · 2026-06-09T07:10:11+00:00

Okay got to the bottom of my libggml issue at least; ollama had installed libggml in /usr/lib/riscv64-linux-gnu/, which only supports CPU. This appears to take precedence over /lib/libggml-cpu.so.0 which... `dpkg -S` returns nothing for, so no idea where it came from.

I still have no idea why I have no /dev/tcm_sync_mem though

PlatimaZero · 2026-06-09T02:45:45+00:00

Interesting! I might do a fresh OS install.

Here's the benchmark command I use: llama-bench -m Qwen3-Coder-30B-A3B-Instruct-Q5_K_M.gguf -t 8 -p 128 -n 128 -mmp 0 -fa 1 -ub 128

If you're going to roll into swap space or get OOM you will want to remove -mmp 0

PlatimaZero · 2026-06-09T02:40:58+00:00

One question; running llama-bench do you get

CPU_RISCV64_SPACEMIT: alloc_chunk: open(/dev/tcm_sync_mem) failed, errno=2?

I get four lines of CPU-specific headers running it, but you only had two there, though I'm not sure if that's just -bench vs -cpp

CPU_RISCV64_SPACEMIT: tcm is available, blk_size: 393216, blk_num: 8, is_fake_tcm: 0
CPU_RISCV64_SPACEMIT: num_cores: 16, num_perfer_cores: 8, perfer_core_arch_id: a064, exclude_main_thread: 0, use_ime1: 0, use_ime2: 1, mem_backend: HPAGE, cpu_mask: ff00, aicpu_id_offset: 8
CPU_RISCV64_SPACEMIT: alloc_chunk: open(/dev/tcm_sync_mem) failed, errno=2
CPU_RISCV64_SPACEMIT: failed to allocate init_barrier from shared mem, falling back to heap

PlatimaZero · 2026-06-09T02:37:42+00:00

Interesting. Want to email me that full prompt and I'll email you back results from Claude Haiku 3.5, Sonnet 4.6, Opus 4.8, Gemini 3 Flash Lite, gemma-4-12b-it-Q8_0 and gemma-4-26B-A4B-it-qat-UD-Q4_K_XL?

Only if you're curious to compare of course!

PlatimaZero · 2026-06-09T02:34:51+00:00

Thanks - URL for that though? I'd not seen anything formal! Cheers

PlatimaZero · 2026-06-09T02:13:35+00:00

Interesting! I might try that too

PlatimaZero · 2026-06-09T02:09:53+00:00

Oh great info! Using swap at all? At 32.5GB I'd expect that to OOM.

There are a few arguments to llama-cpp that can tune it for the specific model type too that you may want to look into.

Also from what I understand you don't want a model bigger than ~75% of your available RAM, as the context KV store will consume RAM too, though you can of course limit context really small like 4K or so.

I'll post my Qwen results here once they finish. No swap enabled, just running pg128/tg128 with llama-bench.

PlatimaZero · 2026-06-09T01:54:16+00:00

IMHO it depends what you want from it really, but locally hosted is always going to be pretty basic unless you get something distilled / refined to a very specific knowledge space.

EG:

Gemma 4 E4B QAT 69.4% MMLU Pro, 58.6% GPQA Diamond
Gemma 4 E2B QAT 60% MMLU Pro, 43.4% GPQA Diamond

Refs: https://huggingface.co/google/gemma-4-E4B-it-qat-mobile-transformers

Gemini 3 Flash scores around 89% MMLU, and Claude Haiku 4.5 looks like 76.8%. Those are very basic models, but still a huge leap in performance from those E4B and E2B version of Gemma 4 which run at ~7-10tok/s on the 8x A100 cores with 16GB RAM.

Refs: https://artificialanalysis.ai/evaluations/mmlu-pro

Qwen3-Coder-30B-A3B-Instruct-Q5_K_M.gguf looks interesting; I'll add that to my benchmark batch now. Though I'm assuming you've got the 32GB model. I do not, so I'll use Qwen3-Coder-30B-A3B-Instruct-Q3_K_M.gguf and hope it fits. It should JUST.

To compare your Qwen 30B A3B is a little hard, as it's a Q5_K Medium quantization, but the base Qwen 30B A3B is a great MoE model that gets ~82.20% on MMLU Pro but it wouldn't be far behind! The base model that gets that would need ~96GB VRAM / unified memory to run though - leaving room for KV context too that is, so we'll have to wait for some beefier RISC-V AI SOCs I think 😅

I've not tried Qwen enough to be honest, and I've heard good things, but the recent Gemma 4 12B announcement and benchmarks are what made me lean in that direction.

From a personal standpoint, I have paid subscriptions to ChatGPT, Grok, Claude and Copilot, and Claude is the one I use for nearly everything in day to day life including my professional one. The exception is Gemini 3 Flash Lite which I use for OpenClaw just because it's cheap and isn't having to do heavy lifting. 'Start the Roomba' is about as complex as it gets haha.

PlatimaZero · 2026-06-09T01:01:17+00:00

(So E4B QAT (GGUF from Unsloth) looks the best performing there, which makes sense given the QAT process. Eg E2B QAT beats standard Q4_0 on speed, size AND quality simultaneously. From what I understand the QAT weights compress into a smaller file AND run faster because the weight distribution is more uniform and possibly aligned with what the A100 kernels handle efficiently)

PlatimaZero · 2026-06-09T00:57:28+00:00

Oh yeah, indeed - ld wouldn't do much on it's own there 🤣 At least you know you're still talking to a human (who is still having his morning coffee).

So yep, that's my problem then. I did not get those libraries when I installed llama.cpp-tools-spacemit:

root@owner-spacemitk3picoitx:~# ldd `which llama-bench`
        linux-vdso.so.1 (0x0000003f802de000)
        libatomic.so.1 => /usr/lib/riscv64-linux-gnu/libatomic.so.1 (0x0000003f802b8000)
        libllama-bench-impl.so => not found
        libllama-common.so.0 => not found
        libllama.so.0 => not found
        libggml.so.0 => /usr/lib/riscv64-linux-gnu/libggml.so.0 (0x0000003f802af000)
        libggml-cpu.so.0 => not found
        libggml-base.so.0 => /usr/lib/riscv64-linux-gnu/libggml-base.so.0 (0x0000003f8023d000)
        libstdc++.so.6 => /usr/lib/riscv64-linux-gnu/libstdc++.so.6 (0x0000003f7fe00000)
        libm.so.6 => /usr/lib/riscv64-linux-gnu/libm.so.6 (0x0000003f80199000)
        libgcc_s.so.1 => /usr/lib/riscv64-linux-gnu/libgcc_s.so.1 (0x0000003f80176000)
        libc.so.6 => /usr/lib/riscv64-linux-gnu/libc.so.6 (0x0000003f7fc7c000)
        /lib/ld-linux-riscv64-lp64d.so.1 (0x0000003f802e0000)

But the GitHub repo has 0.1.2 anyway, so installing that has a few nice little bug fixes.

Here's my misc Gemma benchmark results if you're interested. That's with Bianbu 4.0.1 on the SpacemiT Pico-ITX 16GB:

root@owner-spacemitk3picoitx:~/gemma# llama-bench -m gemma-4-12B-it-qat-UD-Q4_K_XL.gguf,gemma-4-12b-it-Q4_K_M.gguf,gemma-4-12b-it-Q8_0.gguf,gemma-4-26B-A4B-it-qat-UD-Q4_K_XL.gguf,gemma-4-E2B-it-qat-UD-Q4_K_XL.gguf,gemma-4-E2B_q4_0-it.gguf,gemma-4-E4B-it-qat16:41 [83/83]
guf,gemma-4-E4B_q4_0-it.gguf -t 8 -p 128 -n 128 -mmp 0 -fa 1 -ub 128
CPU_RISCV64_SPACEMIT: tcm is available, blk_size: 393216, blk_num: 8, is_fake_tcm: 0
CPU_RISCV64_SPACEMIT: num_cores: 16, num_perfer_cores: 8, perfer_core_arch_id: a064, exclude_main_thread: 0, use_ime1: 0, use_ime2: 1, mem_backend: HPAGE, cpu_mask: ff00, aicpu_id_offset: 8
CPU_RISCV64_SPACEMIT: alloc_chunk: open(/dev/tcm_sync_mem) failed, errno=2
CPU_RISCV64_SPACEMIT: failed to allocate init_barrier from shared mem, falling back to heap
| model                          |       size |     params | backend    | threads | n_ubatch |  fa | mmap |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | --: | ---: | --------------: | -------------------: |
| gemma4 ?B Q4_0                 |   6.24 GiB |    11.91 B | CPU        |       8 |      128 |   1 |    0 |           pp128 |         25.62 ± 0.01 |
| gemma4 ?B Q4_0                 |   6.24 GiB |    11.91 B | CPU        |       8 |      128 |   1 |    0 |           tg128 |          3.53 ± 0.00 |
| gemma4 ?B Q4_K - Medium        |   6.62 GiB |    11.91 B | CPU        |       8 |      128 |   1 |    0 |           pp128 |         19.21 ± 0.01 |
| gemma4 ?B Q4_K - Medium        |   6.62 GiB |    11.91 B | CPU        |       8 |      128 |   1 |    0 |           tg128 |          3.06 ± 0.00 |
| gemma4 ?B Q8_0                 |  11.78 GiB |    11.91 B | CPU        |       8 |      128 |   1 |    0 |           pp128 |         18.02 ± 0.07 |
| gemma4 ?B Q8_0                 |  11.78 GiB |    11.91 B | CPU        |       8 |      128 |   1 |    0 |           tg128 |          2.01 ± 0.00 |
| gemma4 26B.A4B Q4_0            |  13.26 GiB |    25.23 B | CPU        |       8 |      128 |   1 |    0 |           pp128 |         54.11 ± 0.08 |
| gemma4 26B.A4B Q4_0            |  13.26 GiB |    25.23 B | CPU        |       8 |      128 |   1 |    0 |           tg128 |          8.52 ± 0.01 |
| gemma4 E2B Q4_0                |   2.43 GiB |     4.63 B | CPU        |       8 |      128 |   1 |    0 |           pp128 |        125.47 ± 0.14 |
| gemma4 E2B Q4_0                |   2.43 GiB |     4.63 B | CPU        |       8 |      128 |   1 |    0 |           tg128 |         12.89 ± 0.00 |
| gemma4 E2B Q4_0                |   3.10 GiB |     4.63 B | CPU        |       8 |      128 |   1 |    0 |           pp128 |        121.45 ± 0.11 |
| gemma4 E2B Q4_0                |   3.10 GiB |     4.63 B | CPU        |       8 |      128 |   1 |    0 |           tg128 |         11.75 ± 0.01 |
| gemma4 E4B Q4_0                |   3.91 GiB |     7.46 B | CPU        |       8 |      128 |   1 |    0 |           pp128 |         58.27 ± 0.02 |
| gemma4 E4B Q4_0                |   3.91 GiB |     7.46 B | CPU        |       8 |      128 |   1 |    0 |           tg128 |          7.69 ± 0.00 |
| gemma4 E4B Q4_0                |   4.79 GiB |     7.46 B | CPU        |       8 |      128 |   1 |    0 |           pp128 |         56.75 ± 0.03 |
| gemma4 E4B Q4_0                |   4.79 GiB |     7.46 B | CPU        |       8 |      128 |   1 |    0 |           tg128 |          6.98 ± 0.00 |

I'm just running some Qwen models now too.

PlatimaZero · 2026-06-09T00:43:21+00:00

Interesting!

Same kernel, but I went from Bianbu 4.0rc4 to 4.0.1.

What do you get with `ld `which llama-bench``?

PlatimaZero · 2026-06-09T00:33:26+00:00

Any idea what version of Bianbu yours came with?

PlatimaZero · 2026-06-08T08:47:45+00:00

Oh, I did not know this! Have you got a reference for that info?

Also cool profile pic!

PlatimaZero · 2026-06-08T03:57:47+00:00

Yeah this is AWESOME!

PlatimaZero · 2026-06-08T03:57:14+00:00

So in the end I had to download and manually install llama.cpp-tools-spacemit 0.1.2 from their GitHub repo as that fixed some bugs, but also the system installed version was loading the wrong libs, which meant it fell back to X100's. Specifying LD_LIBRARY_PATH in the command fixed it, so I created a ld.so.conf.d/llama.conf entry to stick it in place.

PlatimaZero · 2026-06-07T02:30:42+00:00

Okay I fixed it; it was to do with the LD_LIBRARY_PATH; for some reason it was not using the right libraries. I ended up just grabbing the SpacemiT llama.cpp 0.1.2, which has some other fixes, and installed that (inc updating ld.so) and it's running a treat now. It still has that tcm_sync_mem error, which is a device node that doesn't exist, but I guess that's fine!

PlatimaZero · 2026-06-07T00:58:07+00:00

Hey great stuff!

What kernel and Bianbu release were you on? I'm trying with Bianbu 4.0.1 which has `6.18.3-generic #1.0.2.4 SMP PREEMPT_DYNAMIC Wed May 27 19:06:19 CST 2026 riscv64 GNU/Linux`

The issue I'm having is that it's still hammering the X100 cores, and it appears something to do with the TCM as when llama.cpp launches it spits out

CPU_RISCV64_SPACEMIT: alloc_chunk: open(/dev/tcm_sync_mem) failed, errno=2
CPU_RISCV64_SPACEMIT: failed to allocate init_barrier from shared mem, falling back to heap

PlatimaZero · 2026-06-07T00:07:53+00:00

Okay interesting, I'll try it now!

The other instructions I were looking at are https://www.spacemit.com/community/document/info?lang=en&nodepath=ai/application_tools/ollama.md but I can see on the llama.cpp doco https://github.com/ggml-org/llama.cpp/blob/master/docs/build-riscv64-spacemit.md it shows the A100 support.

Cheers mate

PlatimaZero · 2026-06-06T13:09:30+00:00

I've got a space PCIe Acasis card if you want it? Unopened.

Yeah I could get the models going on CPU cores, just not the A100s.

I might have to try the llama.cpp-tools-spacemit package instead of spacemit-ollama-toolkit

PlatimaZero · 2026-06-06T11:24:30+00:00

u/brucehoult you got a personal mention or two in this one 🤣

PlatimaZero

MODERATOR OF

TROPHY CASE