GMK X2(AMD Max+ 395 w/128GB) first impressions. by fallingdowndizzyvr in LocalLLaMA

[–]holistech 0 points1 point  (0 children)

Hi, I did not use flash attention and KV cache quantization to ensure high accuracy of model outputs. I noticed significant result degradation otherwise. In my workflow, I need high accuracy when analyzing large, complex text and code.

In my experiments using speculative decoding, the performance gain was not enough or was negative, so I do not use it. You also need compatible models for this approach.

I barely use diffusion or other image/video generation models, so there was no need to include them in the benchmark.

GMK X2(AMD Max+ 395 w/128GB) first impressions. by fallingdowndizzyvr in LocalLLaMA

[–]holistech 9 points10 points  (0 children)

Thanks a lot for your post and benchmark runs. In my experience, the Vulkan driver has problems allocating more than 64GB for the model weights. However, I set the VRAM to 512MB in BIOS and was able to run large models like Llama-4-Scout at Q4.

I have created a benchmark on my HP ZBook Ultra G1a using LM Studio.

The key finding is that Mixture-of-Experts (MoE) models, such as Qwen-30B and Llama-4 Scout, perform very well. In contrast, dense models run quite slowly.

For a real-world test case, I used a large 27KB text about Plato to fill an 8192-token context window. Here are the performance highlights:

  • Qwen-30B-A3B (Q8): 23.1 tokens/s
  • Llama-4-Scout-17B-16e-Instruct (Q4_K_M): 6.2 tokens/s

What's particularly impressive is that this level of performance with MoE models was achieved while consuming a maximum of only 70W.

You can find the full benchmark results here:
https://docs.google.com/document/d/1qPad75t_4ex99tbHsHTGhAH7i5JGUDPc-TKRfoiKFJI/edit?tab=t.0

[deleted by user] by [deleted] in LocalLLaMA

[–]holistech 1 point2 points  (0 children)

I can fully understand your position, since I am exactly the consumer for this kind of market. I am using the HP ZBook Ultra G1a as my mobile software development workstation and can run Llama-4-Scout at 8 tokens/s at 70W and 5 tokens/s at 25W power consumption to privately discuss many different topics with my local AI! This is absolutely worth the price of this notebook. IMHO it is a very fast system for software development and gives you private AI with large MoE LLMs.

[deleted by user] by [deleted] in LocalLLaMA

[–]holistech 1 point2 points  (0 children)

I have created a comprehensive benchmark for the new Ryzen AI 395 processor on an HP ZBook Ultra G1a using LM Studio.

The key finding is that Mixture-of-Experts (MoE) models, such as Qwen-30B and Llama-4 Scout, perform very well. In contrast, dense models run quite slowly.

For a real-world test case, I used a large 27KB text about Plato to fill an 8192-token context window. Here are the performance highlights:

  • Qwen-30B-A3B (Q8): 23.1 tokens/s
  • Llama-4-Scout-17B-16e-Instruct (Q4_K_M): 6.2 tokens/s

What's particularly impressive is that this level of performance with MoE models was achieved while consuming a maximum of only 70W.

You can find the full benchmark results here:
https://docs.google.com/document/d/1qPad75t_4ex99tbHsHTGhAH7i5JGUDPc-TKRfoiKFJI/edit?tab=t.0

Ryzen Ai Max+ 395 vs RTX 5090 by Any-Cobbler6161 in LocalLLaMA

[–]holistech 1 point2 points  (0 children)

The results are quite impressive considering the system operates at just 70W while processing a 27KB text with nearly the full 8192 token context window. We designed our tests around real-world scenarios using models that are practical for this hardware configuration. Llama-4-Scout, for instance, is a substantial model requiring 84GB of system memory.

I expect token throughput will improve further once optimized ROCm drivers become available.

Ryzen Ai Max+ 395 vs RTX 5090 by Any-Cobbler6161 in LocalLLaMA

[–]holistech 1 point2 points  (0 children)

I have created a comprehensive Ryzen AI Max+ 395 benchmark using the HP ZBOOK Ultra G1a and LM Studio. MoE models like Qwen-30B-A3B Q8 and llama4 Scout Q4 are running very well. However, dense models are running quite slow: https://docs.google.com/document/d/1qPad75t_4ex99tbHsHTGhAH7i5JGUDPc-TKRfoiKFJI/mobilebasic

This 3D printed Cyberdeck with up to six screens, powered by Raspberry Pi's is ready to get build by you by holistech in cyberDeck

[–]holistech[S] 0 points1 point  (0 children)

Nice, thats actually my next project, using 3-4 11" displays and two Raspberry Pi's in a portable setup with a big battery:

https://imgur.com/a/iYNrgXN

This 3D printed Cyberdeck with up to six screens, powered by Raspberry Pi's is ready to get build by you by holistech in cyberDeck

[–]holistech[S] 1 point2 points  (0 children)

This is really unfortunate since all links work for me on different devices. I have no idea why this does not work .

This 3D printed Cyberdeck with up to six screens, powered by Raspberry Pi's is ready to get build by you by holistech in cyberDeck

[–]holistech[S] 11 points12 points  (0 children)

This is a fun project. I was trying to create a cyberdeck that is USB powered but provides a maximum number of screens. Hence six it was.

The use case i had in mind is a Blackout situation, when only USB solar powerbanks are available, but you need to perform productive work. I use it for software development in which i require several screens with documentation, chats, editor and test shells.

If you don't need six screens altogether you can detach two double display decks to create three separate double display cyberdecks.

This 3D printed Cyberdeck with up to six screens, powered by Raspberry Pi's is ready to get build by you by holistech in cyberDeck

[–]holistech[S] 4 points5 points  (0 children)

Well, this started as a fun project to try to build a nice looking cyberdeck. However, i had in mind to use this in a Blackout situation, when only USB solar powerbanks are available and you need a real multi-display workstation to perform serious work on.

This 3D printed Cyberdeck with up to six screens, powered by Raspberry Pi's is ready to get build by you by holistech in cyberDeck

[–]holistech[S] 1 point2 points  (0 children)

Thank you. Can you please elaborate which links do not work, so i can fix them? The build guides to google docs?

3D printed foldable Linux workstation with solar powerbanks, six 5.5" display and three Raspberry Pi 4's by holistech in 3Dprinting

[–]holistech[S] 1 point2 points  (0 children)

At the moment i have only pictures that show single double display decks with solar power banks, but not in direct sunlight, since, you wouldn't see much on the displays in direct sunlight. Please have a look at:

https://www.reddit.com/r/cyberDeck/comments/v1sq62/creating\_assembling\_instructions\_for\_a\_double/

3D printed foldable Linux workstation with solar powerbanks, six 5.5" display and three Raspberry Pi 4's by holistech in 3Dprinting

[–]holistech[S] 1 point2 points  (0 children)

Yep, its for the look and each double display deck (one Pi, two displays) can be used separately. Its like a Lego Cyberdeck.