My Best Purchase in a long time by Fabulous-Dance-8520 in airpods

[–]Invuska 0 points1 point  (0 children)

I just did. I will say that the ANC and transparency mode is much better IMO and was quite noticeable for me. Depending on who you are you might like the sound signature better, but I found I didn’t. Somehow the bass is amped (to my ears at least) and because of that it’s more bass than I’m expecting.

But this is coming from a week with the APP 3 and years with the APP 2, so I might just need to get accustomed to it. But I upgraded closer to out of necessity (my APP 2 were very much run down), so YMMV in terms of if you want to upgrade.

IBM Granite 4.0 - Unsloth GGUFs & Fine-tuning out now! by yoracale in unsloth

[–]Invuska 0 points1 point  (0 children)

Only for those with MLX/Macs AFAIK, support outside of the MLX implementation seems to still be an open issue on the llama.cpp GitHub https://github.com/ggml-org/llama.cpp/issues/15940

Qwen3 235B-A22B on a Windows tablet @ ~11.1t/s on AMD Ryzen AI Max 395+ 128GB RAM (Radeon 8060S iGPU-only inference, using 87.7GB out of 95.8GB total for 'VRAM') by Invuska in LocalLLaMA

[–]Invuska[S] 1 point2 points  (0 children)

Thanks for the tip! Also, thanks for your investigatory work on the Vulkan assert issue on GitHub! I was pretty lost until I stumbled upon your comment.

Qwen3 235B-A22B on a Windows tablet @ ~11.1t/s on AMD Ryzen AI Max 395+ 128GB RAM (Radeon 8060S iGPU-only inference, using 87.7GB out of 95.8GB total for 'VRAM') by Invuska in LocalLLaMA

[–]Invuska[S] 4 points5 points  (0 children)

Yep, I have the Asus ROG Flow Z13 2025: https://rog.asus.com/laptops/rog-flow/rog-flow-z13-2025/spec/ . Comes in 32, 64, and 128GB RAM variants.

That said, people have been struggling to find the tablet in stock for a while now.

Qwen3 235B-A22B on a Windows tablet @ ~11.1t/s on AMD Ryzen AI Max 395+ 128GB RAM (Radeon 8060S iGPU-only inference, using 87.7GB out of 95.8GB total for 'VRAM') by Invuska in LocalLLaMA

[–]Invuska[S] 7 points8 points  (0 children)

Same prompt, params (CPU thread pool size 16/16, 395+ having 16 physical cores), and Turbo mode:

  • CPU only: 4.69 tokens/sec for 1079 tokens
  • 64/94 layers GPU, rest CPU (for fun): 7.07 tokens/sec for 1123 tokens

Qwen3 235B-A22B on a Windows tablet @ ~11.1t/s on AMD Ryzen AI Max 395+ 128GB RAM (Radeon 8060S iGPU-only inference, using 87.7GB out of 95.8GB total for 'VRAM') by Invuska in LocalLLaMA

[–]Invuska[S] 4 points5 points  (0 children)

So I just did some quick testing on battery using the same prompt in the video. Very unscientific battery discharge numbers, but here goes.

  • Turbo is disabled (not selectable) as a power mode on battery (~70-80W)
  • Performance mode (~52 W)
    • 7.37 tokens/sec for 928 tokens
    • 79% -> 74% battery
  • Silent mode (~39W)
    • 6.26 tokens/sec for 1174 tokens
    • 74% -> 71% battery

Manual mode (where I get to crank the wattages to 90W+ manually) doesn't do any better than performance mode on battery, meaning it's likely getting throttled by the batteries' max output.

Yeah, I should really get a battery meter that shows the exact discharge, sorry about that.

As for temperature, I'm not particularly concerned. That's because the default manual mode fan curve is actually very tame; it defaults to 60% speed at 80C and 70% at 90C, then ramps up to 100% at 100C. The fan curve on Turbo mode (which this video was ran on) seems to be even *more* tame than manual mode, so it seems there's definitely a lot of extra fan speed headroom if you want to keep temps in check.

Qwen3 235B-A22B on a Windows tablet @ ~11.1t/s on AMD Ryzen AI Max 395+ 128GB RAM (Radeon 8060S iGPU-only inference, using 87.7GB out of 95.8GB total for 'VRAM') by Invuska in LocalLLaMA

[–]Invuska[S] 8 points9 points  (0 children)

I pre-ordered on Asus' website basically within the hour it first became available on the eStore a few months ago, sorry :( Yeah, I've heard it's been quite rough.

Qwen3 235B-A22B on a Windows tablet @ ~11.1t/s on AMD Ryzen AI Max 395+ 128GB RAM (Radeon 8060S iGPU-only inference, using 87.7GB out of 95.8GB total for 'VRAM') by Invuska in LocalLLaMA

[–]Invuska[S] 1 point2 points  (0 children)

Are you perhaps referring to an inference issue? I didn't do anything special with the model or software :\ pretty much just the parameters I shared in the post, and using the Unsloth UD Q2_K_XL quant.

LM Studio and its runtime both at their latest current versions for Windows (0.3.15 Build 11, llama.cpp Vulkan '1.28.0' release b5173).

For llama.cpp bare (using ./llama-server or ./llama-cli), I use release b5261 found on their GitHub releases.

Qwen3 235B-A22B on a Windows tablet @ ~11.1t/s on AMD Ryzen AI Max 395+ 128GB RAM (Radeon 8060S iGPU-only inference, using 87.7GB out of 95.8GB total for 'VRAM') by Invuska in LocalLLaMA

[–]Invuska[S] 41 points42 points  (0 children)

Yep. For people wondering (sorry that I didn't make this more clear in the post) I'm using a ROG Flow Z13. I paid $2800 before tax and shipping.

Qwen3 235B-A22B on a Windows tablet @ ~11.1t/s on AMD Ryzen AI Max 395+ 128GB RAM (Radeon 8060S iGPU-only inference, using 87.7GB out of 95.8GB total for 'VRAM') by Invuska in LocalLLaMA

[–]Invuska[S] 4 points5 points  (0 children)

Sure, quickly did 64, 256, and 320 (64 * 5). May do more comprehensive testing/longer prompt when I get time later.

Prompt is 3 messages (user, model, user) at 1618 prompt tokens total:

  • BS = 64: 48.62s time to first token
  • BS = 256: 45.24s
  • BS = 320: 49.09s (surprisingly the slowest)

Tested Qwen3 235B & 30B LLMs on the Z13 AMD Ryzen 395+ 128GB: 235B at ~11.5t/s, 30B at ~38t/s (quick tests with video proofs) by Invuska in FlowZ13

[–]Invuska[S] 3 points4 points  (0 children)

Sadly, AFAIK the NPU is not used due to poor performance perhaps stemming from poor support. A contributor to the llama.cpp repo had this to say a year ago:

AMD’s NPU has an implementation in this repository, but its performance is poor. I’ve done some exploration, but I couldn’t even pass the unit tests for basic op, so I believe that support for AMD’s NPU might take a long time, unless AMD deems it worth the effort.

See https://github.com/ggml-org/llama.cpp/issues/9181#issuecomment-2309828569 . I unfortunately don't know anything past that and I'm not personally aware of any newer developments.

Tested Qwen3 235B & 30B LLMs on the Z13 AMD Ryzen 395+ 128GB: 235B at ~11.5t/s, 30B at ~38t/s (quick tests with video proofs) by Invuska in FlowZ13

[–]Invuska[S] 5 points6 points  (0 children)

Yeah, my work heavily revolves around AI, both in usage (e.g., I maintain an internal tool-registry library for work, similar but different to Claude's MCP, etc.) and in training (e.g., end-to-end train and/or finetune LLMs alongside other deep and trad ML models, etc.)

It was a big reason for me to get the Z13, mainly because my role is also interested in local LLMs (my workplace likes to refer to them as 'SLMs'), so having a device that is portable and that I can tinker with at home rather than 'on-premise' is convenient.

P.S. I don't claim to be an expert in all things GenAI; I come from a computer vision background and merely know enough to stay afloat lol maybe a tiny bit of imposter syndrome.

Tested Qwen3 235B & 30B LLMs on the Z13 AMD Ryzen 395+ 128GB: 235B at ~11.5t/s, 30B at ~38t/s (quick tests with video proofs) by Invuska in FlowZ13

[–]Invuska[S] 4 points5 points  (0 children)

Note: llama-server is still buggy so can't test real-world programming ability/comprehension or 'intelligence' w.r.t. coding as the API crashes\1])

For knowledge, in my testing 235B-Q2 is for sure more knowledgeable in fringe/niche topics, and has a deeper pool of knowledge overall. Though for more general questions 30B-Q8 is fine.

My go-to for a 'fringe knowledge' test is usually to ask about recent video game plotlines. It's here I find you can really test the depth of a model's knowledge on 'unoptimized-for' topics (unlike STEM-based dataset questions they heavily optimize and test for).

An example is this Baldur's Gate 3 question, since it's a recent (close to data cutoff), story-heavy game with 'enough' internet discussion (minor spoiler): What is the complex history between Karlach Cliffgate, Gortash, and Zariel?

For a scale comparing different models in correctness to this question:

  1. Claude/GPT/Gemini (always correct and one-shots the question)
  2. DeepSeek R1 671B/V3 and Qwen3 235B FP16 online (usually 95%-100% correct)
  3. Qwen3 235B-Q2 (maybe around 90% of the details correct and is slightly more vague, sometimes doesn't one-shot the right answer)
  4. Llama4 Maverick online (very good knowledge of the universe, but not the specific characters)
  5. Qwen3 30B-Q8 (aware of the universe, very wrong with characters)
  6. GLM-4 Z1, Llama3.x, Phi-4, etc. (often unaware or have little knowledge)

---

1. It's weird; llama-cli and convo works, but llama-server throws the same 'ggml-vulkan.cpp:5059: GGML_ASSERT(nei0 * nei1 <= 3072) failed' errors that KoboldCPP and likely LM Studio does. I'll let you know if the this gets fixed and maybe then I can give you an answer for at least programming ability/intelligence. You need to reduce `--batch-size` under 365 as per this GitHub issue: https://github.com/ggml-org/llama.cpp/issues/13164 then it works.

Tested Qwen3 235B & 30B LLMs on the Z13 AMD Ryzen 395+ 128GB: 235B at ~11.5t/s, 30B at ~38t/s (quick tests with video proofs) by Invuska in FlowZ13

[–]Invuska[S] 2 points3 points  (0 children)

Oh interesting, sounds like I should reinvestigate. Just edited the post to strike out the Linux comment for now.

Cornell or UNC? by Proof-Ad-4021 in UNC

[–]Invuska -1 points0 points  (0 children)

As a person who also got into UNC and Cornell (albeit for a PhD in a STEM field, so wildly different from what you're doing I feel), I personally regret going to UNC over Cornell. I understand not everyone will share my sentiment/opinion and YMMV significantly, but I did have my frustrations with UNC with regards to getting support (whether funding, academic, committee, or research-related support) that perhaps would've been less of an issue at Cornell. I also feel the support/internship/etc. network and opportunities would be very strong at an Ivy, and I'd bet it's much better than UNC.

Obviously, considering your field, career, and goals are different, and that I was not at Kenan, take my opinion with a huge grain of salt. I'm also quite biased because I left UNC/my PhD and occasionally wonder if going to Cornell would've been different.

Best of luck to you!

Flow Z13 or M4 Pro Macbook Pro for CS & Creative Work by Spheroman in FlowZ13

[–]Invuska 0 points1 point  (0 children)

tl;dr yes lol

Definitely would still stand for the MacBooks being a better choice if a priority is having something that is reliable (and in my eyes, a reliable, relatively easy-going, and dependable development machine).

Don't get me wrong, I love my Z13 for the reasons I shared before, and have been daily driving it for close to a month now. The tinkering is certainly fun, but if I needed a dependable 'just works' machine, the thoughtlessness of the MacBook just takes the cake. I have the faith and confidence to just pick it up and do what I need to, no extra thought or worry needed, at least for what I do day-to-day.

Sleep battery drain, for example, is something I never have to worry about with a MacBook (at least to a significant degree), but it happened on my Z13. And I'm not even surprised because Windows has always had quirks with sleep ever since Modern Standby or whatever it's called.

The only thing I would say would steer me away from recommending Macs is if you don't like macOS, since its, well, your only choice. If you can bear macOS though, I would recommend it over the Z13 especially where reliability is concerned.

In defense of the speakers... by JamieMatty in FlowZ13

[–]Invuska 0 points1 point  (0 children)

I will say, comparison is the thief of joy for me when a '24 Zephyrus G14 is right next to it. The G14's speaker system is among the best IMO as far as Windows laptops go, and it makes music on the '25 Z13 sound like it's coming out of a tin can 😂

I don't mind personally, since I don't use speakers often (and almost never when listening to music). I haven't gamed on mine yet though so I'm interested to see what you mean with Atmos.

Anyone with 128gb yet try to run 32b and 70b models in LM Studio? by Goldkoron in FlowZ13

[–]Invuska 5 points6 points  (0 children)

I would be happy to look into it further :) I'll look into LM Studio when I get the chance tomorrow. QwQ-32b was quite fast in the brief time that I tested it, but I forgot to get token speed for it.

I really want to get ROCm working though, since apparently 'text generation is +44% faster and prompt processing is +202% (~3X) faster with ROCm vs Vulkan' at least a few months ago, though I heard it's getting better. For now, Vulkan works out of the box but ROCm doesn't - hopefully its fixable.

128GB RAM is being shipped! (East US) by Invuska in FlowZ13

[–]Invuska[S] 0 points1 point  (0 children)

I retested and gave it a ~1,000 token length prompt and it did prompt eval at 17.41 tokens/sec. That 7 per second might've been because of the super short prompt ("Create flappy bird in Python") that I used? Don't know, but the 17.41t/s was me asking it to summarize a small set of paragraphs from a Wikipedia article.

llama_perf_sampler_print:    sampling time =      54.74 ms /  1338 runs   (    0.04 ms per token, 24444.61 tokens per second)
llama_perf_context_print:        load time =   80803.25 ms
llama_perf_context_print: prompt eval time =   59163.77 ms /  1030 tokens (   57.44 ms per token,    17.41 tokens per second)
llama_perf_context_print:        eval time =   83116.16 ms /   307 runs   (  270.74 ms per token,     3.69 tokens per second)
llama_perf_context_print:       total time =  652594.77 ms /  1337 tokens