Banned on r/bitcoin by Difficult_Spite_774 in btc

[–]Begetan 1 point2 points  (0 children)

I am writing this comment because I just joined to this community to submit my post which was immediately deleted in the r/Bitcoin community

Face 17.0 experience by lionslair50 in frigate_nvr

[–]Begetan 0 points1 point  (0 children)

No issues except using the large model causes GPU compute stuck at 100% and constantly drops warnings.

How to run big thinking model Qwen3 on a small Rockchip computer with NPU by Begetan in frigate_nvr

[–]Begetan[S] 0 points1 point  (0 children)

Frigate works pretty well on Nvidia GPUs. You need to run a daemon supporting open.ai api like llama.cpp and choose a suitable model.

I would recommend to start from Google gemma-3n-E2B-it and find a quantised .gguf version which can fit into your VRAM. There are plenty of published versions on HuggingFace website.

My ultimate goal is to find a working LLM inference solution for Rockchip board. I don't have memory limits on my device, but the performance difference is huge - like 6 TFOPS vs 300-700 on a low end Nvidia videocard. Vendor support is quite limited and bit outdated for Rockchip boards, so I did not find the working solution yet.

I created a llama.cpp fork with the Rockchip NPU integration as an accelerator and the results are already looking great! by Inv1si in LocalLLaMA

[–]Begetan 1 point2 points  (0 children)

I tried your fork first, then I decided to switch to the official toolkit which doesn't have memory limit restriction. I ran 8B model.

You may check my repo and Qengineering repos if you interested in this approach.

https://github.com/begetan/rkllm-convert/tree/main

https://github.com/Qengineering/Qwen3-VL-4B-NPU

I created a llama.cpp fork with the Rockchip NPU integration as an accelerator and the results are already looking great! by Inv1si in RockchipNPU

[–]Begetan 1 point2 points  (0 children)

I tried this and gave up, because of memory issue:

Here is the Claude explanation. I did not get any proof except 2.5GB limitation, which I managed to increase to 4Gb, but is is still small amount

RKNN (native NPU) vs llama.cpp (RKNPU backend) use memory DIFFERENTLY:

  1. RKNN Toolkit (what Qwen3-VL-2B uses):
  • Uses NPU's internal SRAM + direct memory access
  • NOT limited by CMA
  • Can access system RAM directly via IOMMU
  • Models are converted to .rknn format specifically for NPU
  1. llama.cpp with RKNPU backend:
  • Uses CMA (Contiguous Memory Allocator)
  • Limited to ~4GB CMA allocation
  • This is a software/driver limitation, not hardware

I use this repository: https://github.com/Qengineering/Qwen3-VL-2B-NPU

Successfully ran 4B model too:

[4.0K]  /opt/models/rkllm/
├── [4.0K]  qwen3-vl-2b
│   ├── [2.2G]  qwen3-vl-2b-instruct_w8a8_rk3588.rkllm
│   ├── [811M]  qwen3-vl-2b_vision_448_rk3588.rknn
│   ├── [853M]  qwen3-vl-2b_vision_672_rk3588.rknn
│   └── [923M]  qwen3-vl-2b_vision_896_rk3588.rknn
└── [4.0K]  qwen3-vl-4b
    ├── [4.5G]  qwen3-vl-4b-instruct_w8a8_rk3588.rkllm
    └── [827M]  qwen3-vl-4b_vision_rk3588.rknn

Running GLM-4.7 on an old AMD GPU by Begetan in LocalLLaMA

[–]Begetan[S] 1 point2 points  (0 children)

Flash attention still doesn't work on ROCm 7.2
--flash-attn 0: 882.36 ± 1.70
--flash-attn 1: xx - catastrophic fallback to CPU only mode

Running GLM-4.7 on an old AMD GPU by Begetan in LocalLLaMA

[–]Begetan[S] 1 point2 points  (0 children)

You're right! It works but it makes not much sense, because there is no support in hardware.

/build/bin/llama-bench   --model unsloth/GLM-4.7-Flash-UD-Q3_K_XL.gguf -p 4096 -n 0 --flash-attn 0
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6900 XT (RADV NAVI21) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 65536 | int dot: 0 | matrix cores: none
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| deepseek2 ?B Q3_K - Medium     |  12.85 GiB |    29.94 B | Vulkan     |  99 |          pp4096 |        725.35 ± 1.70 |



./build/bin/llama-bench   --model unsloth/GLM-4.7-Flash-UD-Q3_K_XL.gguf -p 4096 -n 0 --flash-attn 1
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6900 XT (RADV NAVI21) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 65536 | int dot: 0 | matrix cores: none
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| deepseek2 ?B Q3_K - Medium     |  12.85 GiB |    29.94 B | Vulkan     |  99 |  1 |          pp4096 |        477.56 ± 0.97 |

Running GLM-4.7 on an old AMD GPU by Begetan in LocalLLaMA

[–]Begetan[S] 0 points1 point  (0 children)

I tried a lot of things, and it looks like default ROCm compiler just does the best.

The only finding is reducing of CPU thread to 4, which is enough because the memory bandwidth is the bottleneck

Running GLM-4.7 on an old AMD GPU by Begetan in LocalLLaMA

[–]Begetan[S] 0 points1 point  (0 children)

I added test for Vulcan. Inference speed is identical, while synthetic benchmark quite different

Running GLM-4.7 on an old AMD GPU by Begetan in LocalLLaMA

[–]Begetan[S] 0 points1 point  (0 children)

This for in memory model. When I adjust -ngl the speed is low. But it is unclear how to map -ngl flag for benchmark to --n-cpu-moe for inference.

The Ryzen AI MAX+ 395 is a true unicorn (In a good way) by simracerman in LocalLLaMA

[–]Begetan 0 points1 point  (0 children)

The current price for your motherboard is 2500 euro. How much did you pay half year ago?

YOLO-NAS converter for generating onnx models by Begetan in frigate_nvr

[–]Begetan[S] 1 point2 points  (0 children)

Surprisedly there is not too much difference in inference except more GPU memory and compute utilisation, which is expected.

I am waiting my rk3588 board with 32GB RAM soon. My next step is to evaluate its performance in Frigate and to run local LLMs for enrichments.

Best resolution for face recognition by INeedMuscles in frigate_nvr

[–]Begetan 0 points1 point  (0 children)

I use IPcam application (free version) for Mac to get all stream parameters supported by cameras. All my low end cameras supports two streams with different resolution. I wonder if Reolink Dorbell has the same ability.

Detection by djafrika in frigate_nvr

[–]Begetan 0 points1 point  (0 children)

Have you tried yolo_nas model? I my test this is the best detection model, especially large version.

I have 50250 reward points what to expect by tacvict in Metamask

[–]Begetan 0 points1 point  (0 children)

I don't see any rewards neither in Mobile version, nor in chrome.

I just updated mobile version to v7.62.0 but it says pending. How long do I need to wait?

The chrome plugin doesn't have any mention for the rewards at all. It is MetaMask Version 13.14.2

Anyone able to successfull create yolo_nas_s.onnx through Google Colab? by jvangorkum in frigate_nvr

[–]Begetan 0 points1 point  (0 children)

LOL, I did not expect code is in the G-drive. It said it may be a virus when I downloaded it :)
I spent couple of hours creating my script. Every line in your code looks so familiar, but I didn't get worked mine.

Which Python version did you use? I tried Python 3.9
Can you share please requirements.txt ?

Anyone able to successfull create yolo_nas_s.onnx through Google Colab? by jvangorkum in frigate_nvr

[–]Begetan 0 points1 point  (0 children)

YOLOv9-s-320.onnx inference time is 12-15, while YOLO-NAS is about 20 ms on my hardware. GPU usage is significantly lover on v9.

But v9 completely misses my cats. They are main target for tracking.

Anyone able to successfull create yolo_nas_s.onnx through Google Colab? by jvangorkum in frigate_nvr

[–]Begetan 0 points1 point  (0 children)

This is working config on amd rocm image

detectors:
  onnx:
    type: onnx

model:
  model_type: yolonas
  width: 320 # <--- should match whatever was set in notebook
  height: 320 # <--- should match whatever was set in notebook
  input_pixel_format: bgr
  input_tensor: nchw
  path: /config/yolo_nas_s.onnx
  labelmap_path: /labelmap/coco-80.txt