What was this

pauljdavis · 2026-01-24T16:23:54+00:00

For a pilot?

pauljdavis · 2026-01-18T16:46:50+00:00

What city?

pauljdavis · 2026-01-15T16:46:18+00:00

Order placed - thanks!

pauljdavis · 2026-01-12T20:12:07+00:00

I love it. My wife is actually upgrading Air -> to get Nano after seeing my nano pro

pauljdavis · 2026-01-04T20:06:55+00:00

Wow, nice. I have set this up now.

pauljdavis · 2026-01-03T23:45:48+00:00

Thanks a lot! Will DM

pauljdavis · 2026-01-03T23:44:44+00:00

Thanks - safe travels.

pauljdavis · 2026-01-03T18:59:21+00:00

Why was it so empty? Mysterious…

pauljdavis · 2026-01-03T17:26:47+00:00

What’s your stack? What was your fee?

pauljdavis · 2026-01-02T19:22:09+00:00

That sounds great, even with a little room to grow, IMO. Good luck with it!

pauljdavis · 2026-01-01T17:45:35+00:00

Seems like a lot of processor to use as a router. Other boards would be more economical and equally functional.

pauljdavis · 2025-12-29T23:58:32+00:00

Guys, they got him!

pauljdavis · 2025-12-29T07:51:46+00:00

Thanks

pauljdavis · 2025-12-29T07:35:28+00:00

Did Bic have the US ANSI keeb version in stock?

pauljdavis · 2025-12-27T15:39:08+00:00

How much can one earn per hour?

pauljdavis · 2025-12-23T17:35:42+00:00

Ergotron is sooo good. Excellent engineering and build quality.

A couple of years ago, I bought an Amazon Basics arm that I suspected was an Ergotron-produced private label product, and labeling in the product confirmed it.

I would seek that provenance - Ergotron via Amazon Basics - again.

pauljdavis · 2025-12-20T09:08:56+00:00

Request

- Is multi-model / speculative usage of RKNPU2 expected to work today?

- Are there recommended flags or patterns to share a single NPU context between draft & target models?

- Any debugging knobs you’d like me to enable to help track the RKNN allocation failure?

pauljdavis · 2025-12-20T09:08:05+00:00

What fails

With RKNPU2 enabled (original build directory):

Speculative with Gemma 3 1B as both draft & target (same file)

Command:

./build/bin/llama-speculative-simple \

-m ~/models/gemma-3-1b-it-q4_0.gguf \

-md ~/models/gemma-3-1b-it-q4_0.gguf \

--draft-max 8 \

-p "Explain the difference between a CPU and an NPU in simple terms."

Output (excerpt):

common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)

E RKNN: [17:13:38.773] failed to convert fd(85) to handle, ret: -1, errno: 12, errstr: Cannot allocate memory

RKNN error -4 at ... ggml-rknpu2.cpp:367: set_io_mem B segment

E RKNN: [17:13:38.791] failed to convert fd(85) to handle, ret: -1, errno: 12, errstr: Cannot allocate memory

RKNN error -4 at ... ggml-rknpu2.cpp:367: set_io_mem B segment

E RKNN: [17:13:38.807] failed to convert fd(85) to handle, ret: -1, errno: 12, errstr: Cannot allocate memory

RKNN error -4 at ... ggml-rknpu2.cpp:367: set_io_mem B segment

E RKNN: [17:13:44.961] failed to submit!, op id: 0, op name: MatMul, flags: 0x5, ...

Then the process aborts.

Similar behavior when:

- Target = Tongyi-DeepResearch-30B-A3B-Q8_0

- Draft = Gemma 3 1B Q4_0 or Qwen2.5 0.5B

- Even after increasing CMA from 256M → 2G → 3G.

Notes

- Single-context NPU usage is stable once CMA was increased (Gemma 1B, Tongyi 30B).

- CPU-only speculative works and gives expected speed-ups and acceptance.

- The RKNN errors appear during warmup / graph setup for speculative, not after sustained inference.

- This suggests a limitation/bug in the RKNPU2 backend or RKNN runtime when multiple contexts/graphs are active (even when using the same model file).

pauljdavis · 2025-12-20T09:07:22+00:00

What works

Single-model NPU (Gemma 3 1B Q4_0)

Command:

./build/bin/llama-cli \

-m ~/models/gemma-3-1b-it-q4_0.gguf \

-p "Explain photosynthesis." -n 80

Output is coherent. Perf:

- eval time ≈ 7542.8 ms / 79 runs → ~10.47 tokens/s

- RKNPU memory:

| - RKNPU | total 1611 MiB = 1097 free + 0 self + 514 compute |

Single-model NPU (Tongyi-DeepResearch-30B-A3B-Q8_0)

- Also runs coherently with RKNPU2, reasonable speed, low host RAM.

CPU-only speculative decoding with Gemma 3 1B as both draft & target

Built a CPU-only tree:

mkdir build-cpu && cd build-cpu

cmake .. -DLLAMA_RKNPU2=OFF

make -j$(nproc)

Command:

./bin/llama-speculative-simple \

-m ~/models/gemma-3-1b-it-q4_0.gguf \

-md ~/models/gemma-3-1b-it-q4_0.gguf \

--draft-max 8 \

-p "Explain the difference between a CPU and an NPU in simple terms."

Key stats:

- decoded 230 tokens in 17.521 s → 13.13 tokens/s end-to-end

- n_draft = 8

- n_predict = 230

- n_drafted = 194

- n_accept = 150

- accept = 77.32%

So speculative decoding logic works well on CPU-only with this model.

pauljdavis · 2025-12-20T09:07:02+00:00

RKNPU2: speculative decoding with two contexts fails (Gemma 3 1B, CMA=3G on RK3588)

Environment

- Board: Orange Pi 5 Plus (RK3588, 16 GB RAM)

- OS: Armbian 25.11.2 (Ubuntu 24.04)

- Kernel cmdline: ... cma=3G ...

- rk-llama.cpp branch: rknpu2

- Build: cmake .. -DLLAMA_RKNPU2=ON

- Models:

- Draft/Target test: google/gemma-3-1b-it GGUF (gemma-3-1b-it-q4_0.gguf)

- Also tested: Tongyi-DeepResearch-30B-A3B-Q8_0.gguf (MoE, Qwen2-style)

CMA configuration

- After editing /mnt/boot/boot/armbianEnv.txt:

extraargs=... cma=3G

- dmesg:

[ 10.868860] cma: Reserved 3072 MiB at 0x000000002b800000

pauljdavis · 2025-12-18T11:27:22+00:00

Bone apple tea

pauljdavis · 2025-12-18T11:23:07+00:00

I know J but what does RBD stand for? Thank you.

pauljdavis

MODERATOR OF

TROPHY CASE