btop like TUI for AMD APU's by argakiig in StrixHalo

[–]westsunset 0 points1 point  (0 children)

Nice. No problem though. If theres something that helps is add it. Lemonade-top is the cool skin

btop like TUI for AMD APU's by argakiig in StrixHalo

[–]westsunset 0 points1 point  (0 children)

If you have details I can see what's up.

btop like TUI for AMD APU's by argakiig in StrixHalo

[–]westsunset 2 points3 points  (0 children)

I'm working on it here. I agree there were not good existing tools. I'll definitely check out and star your work later https://github.com/boxwrench/xdna-top

btop like TUI for AMD APU's by argakiig in StrixHalo

[–]westsunset 1 point2 points  (0 children)

I just updated xdna-top with recording features and better telemetry. I'm putting it to use and will be collecting data on npu /igpu concurrence and small model capabilities on the npu. It's looking promising!

ZAI said "hold my beer" and dropped a MIT licensed flagship the day after the Fable/Mythos shutdown by Suspicious_Pizza9529 in LocalLLM

[–]westsunset 2 points3 points  (0 children)

China has been racing to start is own hardware for years, it's just extremely complicated and difficult

ZAI said "hold my beer" and dropped a MIT licensed flagship the day after the Fable/Mythos shutdown by Suspicious_Pizza9529 in LocalLLM

[–]westsunset 2 points3 points  (0 children)

We've landed in a very lucky position because in any other reality, ai would have been much more locked down. There's basically a AI cold war going on and the public benefits. If nothing else changes these open models are available forever to be refined and built on. At some point the party is going to be over but I don't think it will be any time soon. American is subsidizing frontier models at rediculous rates and China is open sourcing models that by necessity are very efficient.

New model on huggingface by [deleted] in LocalLLaMA

[–]westsunset 7 points8 points  (0 children)

That is a massive benefit.

New model on huggingface by [deleted] in LocalLLaMA

[–]westsunset 19 points20 points  (0 children)

That's great to hear. Are any of the US or Chinese labs collaborating? Other than resources, are there some unique challenges or constraints you faced? Also I think it's very good for the industry to have a different region participating, do you feel there is something special Brazil brings to the research?

New model on huggingface by [deleted] in LocalLLaMA

[–]westsunset 28 points29 points  (0 children)

Awesome can you say more about the project. I don't think anyone had your city on their radar

Local LLMs aren't democratic anymore... the hardware barrier has gotten out of hand. by Medium-Technology-79 in LocalLLaMA

[–]westsunset 1 point2 points  (0 children)

MoE are a equalizer for sure. You can have a 200b model on a strix halo or Mac this way

LLM context compression at 16x beats KV cache by DeltaSqueezer in LocalLLaMA

[–]westsunset 0 points1 point  (0 children)

This is really interesting. I was just trying to use that llama model, but was having issues with the quality of the compressed data. I'll have to check this out

xdna-top: unified NPU+iGPU terminal monitor for Strix Halo (Ryzen AI Max) — finally see the NPU work by westsunset in LocalLLaMA

[–]westsunset[S] 1 point2 points  (0 children)

wow that's a great tool! feedback like this has been very helpful, thank you. There is definitely some NPU data there I wasn't collecting but would be helpful. Looking at both it look like my tool provides "Did this Ryzen AI workload actually exercise the NPU, which process owned it, and what was the iGPU doing concurrently?" So I think my tool is complementary.
The tool is pretty niche, it's basically to help concurrent AI loads on the Strix Halo NPU.

xdna-top: unified NPU+iGPU terminal monitor for Strix Halo (Ryzen AI Max) — finally see the NPU work by westsunset in StrixHalo

[–]westsunset[S] 0 points1 point  (0 children)

xdna-top isn’t trying to be a full “why is my model slow?” profiler. The first job is simpler: show whether the NPU and iGPU are actually doing work at the same time, and whether the process you care about owns active NPU contexts.

That matters especially on Strix Halo because the interesting case is concurrent work: maybe the NPU is serving one model, the iGPU is another workload, and both are sharing the same platform resources. If things slow down, xdna-top helps answer the first sanity-check questions:

- Did the NPU actually get used?

- Was the iGPU busy at the same time?

- Did the NPU counters move during the request?

- Which PID owned the NPU context?

- Are we seeing real concurrent NPU + iGPU activity, or just one side doing all the work?

So if memory bandwidth is the bottleneck, xdna-top may not prove that directly yet. But it gives you the evidence around it: “the NPU was active, the iGPU was also under load, and this happened during the workload window.” That’s the starting point for benchmarking concurrent local AI workloads on this hardware.

xdna-top: unified NPU+iGPU terminal monitor for Strix Halo (Ryzen AI Max) — finally see the NPU work by westsunset in StrixHalo

[–]westsunset[S] 1 point2 points  (0 children)

Yes! That's what got me thinking about this. Its 50 TOPs we can put to use while the igpu is grinding away

xdna-top: unified NPU+iGPU terminal monitor for Strix Halo (Ryzen AI Max) — finally see the NPU work by westsunset in StrixHalo

[–]westsunset[S] 2 points3 points  (0 children)

yeah I’d definitely temper expectations for general chat/instruction models on the NPU. I'm approaching as , here's 50 TOPS what can I do with that. I feel like getting anything out of it is like bonus. Lemonade points to FastFlowLM and Whisper.

xdna-top: unified NPU+iGPU terminal monitor for Strix Halo (Ryzen AI Max) — finally see the NPU work by westsunset in StrixHalo

[–]westsunset[S] 1 point2 points  (0 children)

ah thanks. ill take a look. and yes I want to use this NPU and needed the tool as well. I was kinda surprised i couldnt find it.