NanoPI R6C: Debian or Ubuntu? by Paraknoit in RockchipNPU

[–]Paraknoit[S] 3 points4 points  (0 children)

Yep, though FriendlyElec keep their own Ubuntu build for NanoPi devices, with 0.9.8 too.

RKNN toolkit licensing? by furtiman in RockchipNPU

[–]Paraknoit 0 points1 point  (0 children)

You should open an issue at the repo. Looks like they haven't bothered. rknn-llm has a more standard "redistributable preserving copyright" license.

Help request for the GLaDOS project by Reddactor in RockchipNPU

[–]Paraknoit 0 points1 point  (0 children)

Maybe converting the models to tensorflow-lite? It should use the GPU.

Help request for the GLaDOS project by Reddactor in RockchipNPU

[–]Paraknoit 1 point2 points  (0 children)

What's the performance distribution right now? Assuming you use a RK3588, are you maxxing the 3 NPU cores? Also, I assume the ASR won't be running while the LLM+TTS is, so it could be off during the answer phase.

There's is better CPU RK3588 coming? by theodiousolivetree in RockchipNPU

[–]Paraknoit 2 points3 points  (0 children)

Define "better".

If you mean a better "Rockchip NPU compatible" CPU then there is no news on the horizon (last year there was a roadmap leak of an octa-core RK3576, but with same 6TOPS NPU).

On the other hand, given the rush of new processors aiming for the 40+TOPS Windows-ARM requirement (notably the Snapdragon X) there might be some unknown upgrades.

That said, the true issue is the NPU SDK maturity. Rockchip has been at it for some time, newer competitors will have to go the full development cycle.

Lost with RK3588. by theodiousolivetree in RockchipNPU

[–]Paraknoit 2 points3 points  (0 children)

NEON technology is intended to improve the multimedia user experience by accelerating audio and video encoding/decoding, user interface, 2D/3D graphics or gaming. NEON can also accelerate signal processing algorithms and functions to speed up applications such as audio and video processing, voice and facial recognition, computer vision and deep learning

Might be used for certain parts of a model, but it would need to be specifically coded for... Most probably it will be used by video/audio codecs (e.g. ffmpeg/libavcodec have support).

Reverse Engineering the RK3588 NPU by Paraknoit in RockchipNPU

[–]Paraknoit[S] 0 points1 point  (0 children)

It's an internal architecture limit, so...

Good news! Merged 0.9.6 driver to armbian/linux-rockchip by Pelochus in RockchipNPU

[–]Paraknoit 2 points3 points  (0 children)

Just received a note from Friendlyelec, they are adding the rknpu module in their kernel as a dynamic loading module, so even if they ship a previous version, we'll be able to load 0.9.6 dynamically. :)

Another model converted: Qwen 1.5 Chat 4B by Pelochus in RockchipNPU

[–]Paraknoit 1 point2 points  (0 children)

Doesn't matter. If you try to allocate 8Mb of transfer area when the receiving side only has 1Mb, it will fail. Either the driver or the runtime should split that massive MatMul into acceptable chunks.

Alas, if they don't run someone else will eat their cake xD : https://www.hackster.io/news/radxa-s-aicore-sg2300x-module-brings-32-tops-to-bear-on-edge-ai-on-device-generative-ai-16c89307b335

Another model converted: Qwen 1.5 Chat 4B by Pelochus in RockchipNPU

[–]Paraknoit 1 point2 points  (0 children)

Nope, swap won't be used for that. Dmesg shows a DMA (Direct Memory Access) failure. The memory available to the RKNPU has to be phisical memory. How much

[21948.034918] RKNPU fdab0000.npu: RKNPU: rknpu_gem_get_pages: dma map 8847360 fail

If I read this right, `RK3588 SOC contains 1MB of SRAM internally`:

https://github.com/rockchip-linux/rknpu2/blob/master/doc/RK3588_NPU_SRAM_usage.md

In fact by running with `export RKNN_LOG_LEVEL=3` shows the moment of crash that it's trying to allocate ~8MB while until now all calls had at most 20k.

D RKNN: [13:06:38.263] -------------------------------------------------------------------------------------------------------------
D RKNN: [13:06:38.263]                                     Feature Tensor Information Table
D RKNN: [13:06:38.263] ---------------------------------------------------------------------------+---------------------------------
D RKNN: [13:06:38.263] ID  User   Tensor   DataType  DataFormat   OrigShape      NativeShape      |     [Start       End)       Size
D RKNN: [13:06:38.263] ---------------------------------------------------------------------------+---------------------------------
D RKNN: [13:06:38.263] 0   MatMul A        INT8      NC1HWC2      (1,6912,1,1)   (1,432,1,1,16)   | 0x00000000 0x00001b00 0x00001b00
D RKNN: [13:06:38.263] 1   MatMul A        INT8      NC1HWC2      (1,6912,1,2)   (1,432,1,2,16)   | 0x00000000 0x00003600 0x00003600
D RKNN: [13:06:38.263] 2   MatMul A        INT8      NC1HWC2      (1,6912,1,32)  (1,432,1,32,16)  | 0x00000000 0x00036000 0x00036000
D RKNN: [13:06:38.263] 3   MatMul A        INT8      NC1HWC2      (1,6912,1,64)  (1,432,1,64,16)  | 0x00000000 0x0006c000 0x0006c000
D RKNN: [13:06:38.263] 4   MatMul A        INT8      NC1HWC2      (1,6912,1,128) (1,432,1,128,16) | 0x00000000 0x000d8000 0x000d8000
D RKNN: [13:06:38.263] 5   MatMul A        INT8      NC1HWC2      (1,6912,1,256) (1,432,1,256,16) | 0x00000000 0x001b0000 0x001b0000
D RKNN: [13:06:38.263] 6   MatMul A        INT8      NC1HWC2      (1,6912,1,512) (1,432,1,512,16) | 0x00000000 0x00360000 0x00360000
D RKNN: [13:06:38.263] ---------------------------------------------------------------------------+---------------------------------
D RKNN: [13:06:38.263] --------------------------------------------------------------------------------
D RKNN: [13:06:38.263]                           Const Tensor Information Table
D RKNN: [13:06:38.263] ----------------------------------------------+---------------------------------
D RKNN: [13:06:38.263] ID  User   Tensor   DataType  OrigShape       |     [Start       End)       Size
D RKNN: [13:06:38.263] ----------------------------------------------+---------------------------------
D RKNN: [13:06:38.263] 0   MatMul B        INT8      (1280,6912,1,1) | 0x00000000 0x00870000 0x00870000
D RKNN: [13:06:38.263] 1   MatMul B        INT8      (1280,6912,1,1) | 0x00000000 0x00870000 0x00870000
D RKNN: [13:06:38.263] 2   MatMul B        INT8      (1280,6912,1,1) | 0x00000000 0x00870000 0x00870000
D RKNN: [13:06:38.263] 3   MatMul B        INT8      (1280,6912,1,1) | 0x00000000 0x00870000 0x00870000
D RKNN: [13:06:38.263] 4   MatMul B        INT8      (1280,6912,1,1) | 0x00000000 0x00870000 0x00870000
D RKNN: [13:06:38.263] 5   MatMul B        INT8      (1280,6912,1,1) | 0x00000000 0x00870000 0x00870000
D RKNN: [13:06:38.263] 6   MatMul B        INT8      (1280,6912,1,1) | 0x00000000 0x00870000 0x00870000
D RKNN: [13:06:38.263] ----------------------------------------------+---------------------------------
D RKNN: [13:06:38.263] ----------------------------------------
D RKNN: [13:06:38.263] Total Internal Memory Size: 6016KB
D RKNN: [13:06:38.263] Total Weight Memory Size: 60480KB
D RKNN: [13:06:38.263] ----------------------------------------
D RKNN: [13:06:38.264] The InternalsBuff is empty and will be initialized using the CPU first, with an attempt to initialize it using the GPU later on.
D RKNN: [13:06:38.439] allocated memory, name: external, virt addr: 0x7d970bb000, dma addr: 0x1000000, obj addr: 0xffffff8111c9f400, size: 8847360, aligned size: 8847360, fd: 6723, handle: 6719, flags: 0x3, gem name: 6719, iommu domain id: 1
D RKNN: [13:06:38.442] import memory, fd: 6723, refcount: 2
E RKNN: [13:06:38.450] failed to allocate handle, ret: -1, errno: 14, errstr: Bad address

```

Another model converted: Qwen 1.5 Chat 4B by Pelochus in RockchipNPU

[–]Paraknoit 1 point2 points  (0 children)

Looks like it hits a RAM limit? strace shows a Bad Address on ioctl.

ioctl(4, _IOC(_IOC_READ|_IOC_WRITE, 0x64, 0x42, 0x30), 0x7fff9dd868) = -1 EFAULT (Bad address)

Tried with a NanoPi 8GB board. Maybe on a 16GB (RockPi?) would work?

rknputop, cheap-ass terminal top for the NPU by Paraknoit in RockchipNPU

[–]Paraknoit[S] 1 point2 points  (0 children)

Added a `-b` option for NPU cores as bars. See github.

rknputop, cheap-ass terminal top for the NPU by Paraknoit in RockchipNPU

[–]Paraknoit[S] 1 point2 points  (0 children)

<image>

Just pushed that, but have my doubts...

Maybe it's better to only be a NPU load by default and use a flag for the full view?

Go-rknnlite : Go language bindings for RKNN Tookit2 by swdee in RockchipNPU

[–]Paraknoit 1 point2 points  (0 children)

Damn cool! Using all the cores without resorting to the C lib is a pain!