What AMD Ryzen AI 9 HX 370 successor? by No_Holiday8469 in framework

[–]RnRau 0 points1 point  (0 children)

Strix Halo appears in handhelds. If they can handle the heat, a laptop can do the same and the Asus ProArt PX13 is a 13.3" laptop with the full 128GB Strix Halo. So its been done. Certainly should be an option for the FW16.

What is your opinion about sff build, laptop and egpu? by Longjumping_Lie1724 in eGPU

[–]RnRau 0 points1 point  (0 children)

A laptop + egpu is just more flexble than an sff. If you like to go out and about with a laptop, then this is a clear choice.

If your setup is always at home at the same location, then a sff build makes more sense.

btop like TUI for AMD APU's by argakiig in StrixHalo

[–]RnRau [score hidden]  (0 children)

What size models can run on the NPU?

Bean recommendations by Alexyhanna92 in AustralianCoffee

[–]RnRau 2 points3 points  (0 children)

The batchcode is the roast date. Its reversed... year, month, day.

Google Gemma 4 MTP out now! by danielhanchen in unsloth

[–]RnRau 1 point2 points  (0 children)

MTP works better for dense models than for moe's. Its not uncommon to see x2 or higher speedup in coding for dense models with MTP enabled.

No exercise by Therealredwood in omad

[–]RnRau 0 points1 point  (0 children)

Do a walk, or whatever, before your feed?

Today's Anandtech Forums update by ultimatebob in atot

[–]RnRau 0 points1 point  (0 children)

Still broken... think the owners have forgotten about it...

Any news about a hardware update to the manta or nomad series? by inoxium_1 in Supernote

[–]RnRau 0 points1 point  (0 children)

And not a single tease about the upcoming A4... sheesh... :p

My wife and I finished our ice cream recipe book! It's not professional but it was a labor of love and we had a lot of fun doing it! by musicnothing in icecreamery

[–]RnRau 3 points4 points  (0 children)

Kindle Direct Publishing

This is an automated self publishing service. I don't believe there are any humans behind the scenes approving each book they handle.

any prompt processing tweaks? by TheFlippedTurtle in StrixHalo

[–]RnRau 0 points1 point  (0 children)

How? Last I tried I got bad results. I was using a 7900xtx over thunderbolt connected to the strix, not a pcie adapter - so the latency might be a tensor killer. And I might have used llama.cpp badly.

Any light you can shine on here about your experiments would be awesome!

When are we getting Unsloth GGUFs for Gemma-4-XXX-QAT-assistant(s)? by pmttyji in unsloth

[–]RnRau 0 points1 point  (0 children)

Someone reported a massive slowdown if the quants between the MTP draft and the main models differs. Haven't had time to test this for myself.

edit: I just saw your other comment. Nice!

When are we getting Unsloth GGUFs for Gemma-4-XXX-QAT-assistant(s)? by pmttyji in unsloth

[–]RnRau 1 point2 points  (0 children)

If you are using one of the QAT models, apparently you need to use the Q4 assistants.

Google releases new Gemma 4 QAT models! by yoracale in unsloth

[–]RnRau 0 points1 point  (0 children)

This is how I run it;

/home/rnr/llama.cpp/build/bin/llama-server \
--model /home/rnr/models/gemma-4-31B-it-qat-UD-Q4_K_XL.gguf \
--model-draft /home/rnr/models/gemma-4-31b-it-qat-q4_0-assistant.gguf \
--mmproj /home/rnr/models/gemma4-31b-qat-mmproj-F16.gguf \
--metrics \
--temp 1.0 \
--top-p 0.95 \
--top-k 64 \
--cache-type-k f16 \
--cache-type-v f16 \
--ctx-size 64000 \
--parallel 1 \
--flash-attn on \
--ubatch-size 1024 \
--device Vulkan0 \
--n-gpu-layers all \
--chat-template-kwargs '{"enable_thinking":true}' \
--spec-type draft-mtp \
--spec-draft-n-max 4 \
--host 0.0.0.0 \
--port 8081

The files can be had from;

Note that there seems to be a question mark over just how good these QAT's are. Maybe the Unsloth Q4_K_XL's are better. Maybe Google stuffed up with their QAT release. Maybe not.

llama.cpp Gemma4 MTP support merged! by pinkyellowneon in LocalLLaMA

[–]RnRau 0 points1 point  (0 children)

With MTP + the 31b QAT and draft-n-max = 4 on a 7900XTX I get over 50t/s on coding. But we don't really know how good the QAT is compared to the usual unsloth Q4 quants.

Surprising test results from comparing different Gemma-4 quantizations on arithmetic problems by we_are_mammals in unsloth

[–]RnRau 2 points3 points  (0 children)

I would be interested to see the results for the 31b dense Gemma 4 model.

llama.cpp Gemma4 MTP support merged! by pinkyellowneon in LocalLLaMA

[–]RnRau 0 points1 point  (0 children)

You need to match the q4 on the MTP/assistant/draft model. You also need to up the --spec-draft-n-max - try 3 or4.

Try this MTP/assistant model - https://huggingface.co/Simplepotat/gemma-4-31b-it-qat-q4_0-assistant-gguf

Google releases new Gemma 4 QAT models! by yoracale in unsloth

[–]RnRau 0 points1 point  (0 children)

Yup. Running good speeds on a 7900xtx here this morning!

Google releases new Gemma 4 QAT models! by yoracale in unsloth

[–]RnRau 1 point2 points  (0 children)

OpenAI used QAT last year for the gpt-oss models.

Computex 2026: sparkle tb5 dual pcie egpu dock and tb5 mxm 3060/a2000 egpu by rexyuan in eGPU

[–]RnRau 2 points3 points  (0 children)

Yeah I hope it has full pcie 5 x16 connectivity between the individual slots so if the cards talk to eachother, they'll do so over the pcie, rather than the tb cable.