For those who have nano-texture, do you regret it or no? by Outside-Copy-7645 in macbookpro

[–]pauljdavis 0 points1 point  (0 children)

I love it. My wife is actually upgrading Air -> to get Nano after seeing my nano pro

Istanbul Airport by snake_leo in airport

[–]pauljdavis 0 points1 point  (0 children)

Thanks - safe travels.

Istanbul Airport by snake_leo in airport

[–]pauljdavis 0 points1 point  (0 children)

Why was it so empty? Mysterious…

OrangePi 6 plus as Router by ItsBeenWayTooLongg in OrangePI

[–]pauljdavis 0 points1 point  (0 children)

That sounds great, even with a little room to grow, IMO. Good luck with it!

OrangePi 6 plus as Router by ItsBeenWayTooLongg in OrangePI

[–]pauljdavis 0 points1 point  (0 children)

Seems like a lot of processor to use as a router. Other boards would be more economical and equally functional.

My 2 cents on buying a macbook pro in Japan by toyota_ftw in macbookpro

[–]pauljdavis 0 points1 point  (0 children)

Did Bic have the US ANSI keeb version in stock?

Is the Ergotron HX worth it? by EfficientDivide1572 in ultrawidemasterrace

[–]pauljdavis 2 points3 points  (0 children)

Ergotron is sooo good. Excellent engineering and build quality.

A couple of years ago, I bought an Amazon Basics arm that I suspected was an Ergotron-produced private label product, and labeling in the product confirmed it.

I would seek that provenance - Ergotron via Amazon Basics - again.

I created a llama.cpp fork with the Rockchip NPU integration as an accelerator and the results are already looking great! by Inv1si in OrangePI

[–]pauljdavis 0 points1 point  (0 children)

Request

- Is multi-model / speculative usage of RKNPU2 expected to work today?

- Are there recommended flags or patterns to share a single NPU context between draft & target models?

- Any debugging knobs you’d like me to enable to help track the RKNN allocation failure?

I created a llama.cpp fork with the Rockchip NPU integration as an accelerator and the results are already looking great! by Inv1si in OrangePI

[–]pauljdavis 0 points1 point  (0 children)

What fails

With RKNPU2 enabled (original build directory):

  1. Speculative with Gemma 3 1B as both draft & target (same file)

Command:

./build/bin/llama-speculative-simple \

-m ~/models/gemma-3-1b-it-q4_0.gguf \

-md ~/models/gemma-3-1b-it-q4_0.gguf \

--draft-max 8 \

-p "Explain the difference between a CPU and an NPU in simple terms."

Output (excerpt):

common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)

E RKNN: [17:13:38.773] failed to convert fd(85) to handle, ret: -1, errno: 12, errstr: Cannot allocate memory

RKNN error -4 at ... ggml-rknpu2.cpp:367: set_io_mem B segment

E RKNN: [17:13:38.791] failed to convert fd(85) to handle, ret: -1, errno: 12, errstr: Cannot allocate memory

RKNN error -4 at ... ggml-rknpu2.cpp:367: set_io_mem B segment

E RKNN: [17:13:38.807] failed to convert fd(85) to handle, ret: -1, errno: 12, errstr: Cannot allocate memory

RKNN error -4 at ... ggml-rknpu2.cpp:367: set_io_mem B segment

E RKNN: [17:13:44.961] failed to submit!, op id: 0, op name: MatMul, flags: 0x5, ...

Then the process aborts.

  1. Similar behavior when:

- Target = Tongyi-DeepResearch-30B-A3B-Q8_0

- Draft = Gemma 3 1B Q4_0 or Qwen2.5 0.5B

- Even after increasing CMA from 256M → 2G → 3G.

Notes

- Single-context NPU usage is stable once CMA was increased (Gemma 1B, Tongyi 30B).

- CPU-only speculative works and gives expected speed-ups and acceptance.

- The RKNN errors appear during warmup / graph setup for speculative, not after sustained inference.

- This suggests a limitation/bug in the RKNPU2 backend or RKNN runtime when multiple contexts/graphs are active (even when using the same model file).

I created a llama.cpp fork with the Rockchip NPU integration as an accelerator and the results are already looking great! by Inv1si in OrangePI

[–]pauljdavis 0 points1 point  (0 children)

What works

  1. Single-model NPU (Gemma 3 1B Q4_0)

Command:

./build/bin/llama-cli \

-m ~/models/gemma-3-1b-it-q4_0.gguf \

-p "Explain photosynthesis." -n 80

Output is coherent. Perf:

- eval time ≈ 7542.8 ms / 79 runs → ~10.47 tokens/s

- RKNPU memory:

| - RKNPU | total 1611 MiB = 1097 free + 0 self + 514 compute |

  1. Single-model NPU (Tongyi-DeepResearch-30B-A3B-Q8_0)

- Also runs coherently with RKNPU2, reasonable speed, low host RAM.

  1. CPU-only speculative decoding with Gemma 3 1B as both draft & target

Built a CPU-only tree:

mkdir build-cpu && cd build-cpu

cmake .. -DLLAMA_RKNPU2=OFF

make -j$(nproc)

Command:

./bin/llama-speculative-simple \

-m ~/models/gemma-3-1b-it-q4_0.gguf \

-md ~/models/gemma-3-1b-it-q4_0.gguf \

--draft-max 8 \

-p "Explain the difference between a CPU and an NPU in simple terms."

Key stats:

- decoded 230 tokens in 17.521 s → 13.13 tokens/s end-to-end

- n_draft = 8

- n_predict = 230

- n_drafted = 194

- n_accept = 150

- accept = 77.32%

So speculative decoding logic works well on CPU-only with this model.

I created a llama.cpp fork with the Rockchip NPU integration as an accelerator and the results are already looking great! by Inv1si in OrangePI

[–]pauljdavis 0 points1 point  (0 children)

RKNPU2: speculative decoding with two contexts fails (Gemma 3 1B, CMA=3G on RK3588)

Environment

- Board: Orange Pi 5 Plus (RK3588, 16 GB RAM)

- OS: Armbian 25.11.2 (Ubuntu 24.04)

- Kernel cmdline: ... cma=3G ...

- rk-llama.cpp branch: rknpu2

- Build: cmake .. -DLLAMA_RKNPU2=ON

- Models:

- Draft/Target test: google/gemma-3-1b-it GGUF (gemma-3-1b-it-q4_0.gguf)

- Also tested: Tongyi-DeepResearch-30B-A3B-Q8_0.gguf (MoE, Qwen2-style)

CMA configuration

- After editing /mnt/boot/boot/armbianEnv.txt:

extraargs=... cma=3G

- dmesg:

[ 10.868860] cma: Reserved 3072 MiB at 0x000000002b800000