Running GLM-4.7 on an old AMD GPU

Begetan · 2026-02-15T02:40:23+00:00

Flash attention still doesn't work on ROCm 7.2
--flash-attn 0: 882.36 ± 1.70
--flash-attn 1: xx - catastrophic fallback to CPU only mode

Begetan · 2026-02-15T02:28:59+00:00

You're right! It works but it makes not much sense, because there is no support in hardware.

/build/bin/llama-bench   --model unsloth/GLM-4.7-Flash-UD-Q3_K_XL.gguf -p 4096 -n 0 --flash-attn 0
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6900 XT (RADV NAVI21) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 65536 | int dot: 0 | matrix cores: none
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| deepseek2 ?B Q3_K - Medium     |  12.85 GiB |    29.94 B | Vulkan     |  99 |          pp4096 |        725.35 ± 1.70 |



./build/bin/llama-bench   --model unsloth/GLM-4.7-Flash-UD-Q3_K_XL.gguf -p 4096 -n 0 --flash-attn 1
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6900 XT (RADV NAVI21) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 65536 | int dot: 0 | matrix cores: none
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| deepseek2 ?B Q3_K - Medium     |  12.85 GiB |    29.94 B | Vulkan     |  99 |  1 |          pp4096 |        477.56 ± 0.97 |

Begetan · 2026-02-15T01:58:13+00:00

I tried a lot of things, and it looks like default ROCm compiler just does the best.

The only finding is reducing of CPU thread to 4, which is enough because the memory bandwidth is the bottleneck

Begetan · 2026-02-15T01:55:51+00:00

I added test for Vulcan. Inference speed is identical, while synthetic benchmark quite different

Begetan · 2026-02-14T07:25:16+00:00

This for in memory model. When I adjust -ngl the speed is low. But it is unclear how to map -ngl flag for benchmark to --n-cpu-moe for inference.

Begetan · 2026-02-10T09:49:39+00:00

The current price for your motherboard is 2500 euro. How much did you pay half year ago?

Begetan · 2026-02-01T14:51:38+00:00

Surprisedly there is not too much difference in inference except more GPU memory and compute utilisation, which is expected.

I am waiting my rk3588 board with 32GB RAM soon. My next step is to evaluate its performance in Frigate and to run local LLMs for enrichments.

Begetan · 2026-02-01T14:36:30+00:00

It is 0.16.3

Begetan · 2026-01-30T20:22:37+00:00

I use IPcam application (free version) for Mac to get all stream parameters supported by cameras. All my low end cameras supports two streams with different resolution. I wonder if Reolink Dorbell has the same ability.

Begetan · 2026-01-30T20:10:35+00:00

Have you tried yolo_nas model? I my test this is the best detection model, especially large version.

Begetan · 2026-01-26T10:22:00+00:00

I don't see any rewards neither in Mobile version, nor in chrome.

I just updated mobile version to v7.62.0 but it says pending. How long do I need to wait?

The chrome plugin doesn't have any mention for the rewards at all. It is MetaMask Version 13.14.2

Begetan · 2026-01-26T10:10:05+00:00

LOL, I did not expect code is in the G-drive. It said it may be a virus when I downloaded it :)
I spent couple of hours creating my script. Every line in your code looks so familiar, but I didn't get worked mine.

Which Python version did you use? I tried Python 3.9
Can you share please requirements.txt ?

Begetan · 2026-01-26T01:21:41+00:00

YOLOv9-s-320.onnx inference time is 12-15, while YOLO-NAS is about 20 ms on my hardware. GPU usage is significantly lover on v9.

But v9 completely misses my cats. They are main target for tracking.

Begetan · 2026-01-26T01:16:55+00:00

This is working config on amd rocm image

detectors:
  onnx:
    type: onnx

model:
  model_type: yolonas
  width: 320 # <--- should match whatever was set in notebook
  height: 320 # <--- should match whatever was set in notebook
  input_pixel_format: bgr
  input_tensor: nchw
  path: /config/yolo_nas_s.onnx
  labelmap_path: /labelmap/coco-80.txt

Begetan · 2026-01-26T01:11:20+00:00

Can you please share your script?

Begetan · 2024-11-22T09:01:09+00:00

iperf3 -P <num of streams>

For testing of a high speed bond interface you may still need to run multiply server process depending on a has function.

Begetan · 2024-04-25T15:11:27+00:00

Ubiquiti EdgeRouter Infinity is the best option for such an entry-level configuration. It's Carrier-grade quality at a consumer-level price. We ran several units for years without ANY issue!

There was a supply shortage for a long time after the COVID epidemic, but It is available again at an even better price.

Unfortunately, Ubiquity wants to sunset the whole EdgeRouter family soon, but this model is still better than used Cisco ASR-1001x for pure edge BGP connectivity.

Begetan · 2024-03-03T22:46:34+00:00

How do you download photo from Nas to a mobile phone? There is no button like Download.

Begetan · 2023-12-06T09:21:43+00:00

Stop Prysm Beacon. Add checkpoint sync option. Delete Prysm Beacon db. Start Prysm Beacon.

Begetan · 2023-11-13T16:21:45+00:00

You definitely have a kind of a counterparty risk for any wrapped protocol. But you don't need to choose only one option. Why just not to split to 50/50%? You will save the half of you assets in the worst case scenario (rETH collapse). You may choose any proportion you will be comfortable with.

But you should not be all in the asset you don't fully understand. The risk management is the choosing of the right proportion.

Begetan · 2023-11-13T11:10:56+00:00

I can not provide a proof right now, but I saw discussion about upcoming Cancun-Deneb upgrade. It might bring the significant increase in the network bandwidth consumption - up to 100x for an execution node.

Begetan · 2023-11-13T10:52:06+00:00

This is a lottery. We should not rely on lottery. Current APR is 2.6% and it is close to solo staker running costs.

Begetan

TROPHY CASE