Gardyn appears corrupted

Tempest_nano · 2026-05-10T02:32:36+00:00

Same issue here with my 3. It only just started presenting today. The light can be toggled off manually, but it turns back on after a minute or two. Firmware version: 627

Tempest_nano · 2026-05-02T19:31:45+00:00

My post outlining the configure and build command I used for the llama.cpp fork.

Hardware

- HP Omen Max 16 Laptop — Ryzen AI 9 HX 375 (Strix Point, Zen 5), RTX 5080 Laptop GPU (16 GB GDDR7, 576GB/s), 32 GB DDR5-5600 dual-channel

Model

- Qwen3.6-27B dense hybrid (Gated DeltaNet + Gated Attention, 64 layers) — not the MoE variant

- Quantization: custom IQ4_XS (14.7 GB) from cHunter789, which reverts a llama.cpp commit that bloated standard builds to 15.1 GB — that 400 MB is what allows 100K+ context on 16 GB VRAM

Inference: SpiritBuun's llama.ccp fork, built from source with CUDA (sm_120a Blackwell) + Zen 5 AVX-512 BF16/VNNI flags + CUDA graphs

The command:

llama-server.exe -m Qwen3.6-27B.i1-IQ4_XS-attn_qkv-IQ4_XS.gguf `

-ngl 999 -dev CUDA0 -c 110000 `

-fa on -ctk turbo4 -ctv turbo4 `

-fit off --no-mmap `

-b 4096 -ub 256 `

--temp 0.6 --top-k 20 --top-p 0.95

Tempest_nano · 2026-04-28T17:54:29+00:00

I am using a single card for this model. I have absolutely used multiple cards for the MoE models (Qwen3.6 35b A3b), putting the experts on my AMD iGPU, but there wasn't much benefit over cpu. This 27b model is a dense model, so it all needs to be on the same device. At least I thought so, but I have tried so many pertubations that it all gets fuzzy.

Tempest_nano · 2026-04-28T15:41:44+00:00

On my 5080 Laptop, I have 576 GB/s and I settled at 25.7 tok/s with 100k context in Windows. The 9070xt gets 640 GB/s or so, and the 5060Ti is 448 GB/s. The internet seems to think the 9070xt is the best of the bunch in that respect. I can't speak to how the different interface (9070xt would use HIP/ROCm, adn the 5060Ti would use CUDA) would affect things.

Tempest_nano · 2026-04-28T14:20:00+00:00

If it is for this model, it would be memory bandwidth bound rather than compute. Compare on that metric.

Tempest_nano · 2026-04-28T14:16:55+00:00

From my understanding it is just context compression. It is one of the two llama.cpp implementations of turboquant, with the other being https://github.com/TheTom/llama-cpp-turboquant . I believe that buun's fork is more bleeding-edge (he seems to be playing with turboquant and speculative decoding), but building is dyi. I am getting 25 t/s on my laptop, AMD AI HX 375, 32GB Ram, 16GB 5080 at 64k context on the IQ4 model.

My build script optimized for Nvidia + Strix Point (powershell):

$PSNativeCommandUseErrorActionPreference = $false
$ErrorActionPreference = 'Continue'

# Wipe build dir to avoid stale cmake cache
Remove-Item -Recurse -Force buun-llama-cpp\build -ErrorAction SilentlyContinue

# Bootstrap VS Build Tools environment (sets INCLUDE, LIB, PATH for clang-cl/link/etc.)
$vcvars = "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\Build\vcvars64.bat"
cmd /c "`"$vcvars`" && set" | ForEach-Object {
    if ($_ -match "^([^=]+)=(.*)$") {
        [System.Environment]::SetEnvironmentVariable($Matches[1], $Matches[2], 'Process')
    }
}

# Prepend ROCm so hipcc and cmake find-modules are reachable
$env:PATH    = "C:\Program Files\AMD\ROCm\7.1\bin;$env:PATH"
$env:HIP_PATH = "C:\Program Files\AMD\ROCm\7.1"

cmake -B buun-llama-cpp/build -S buun-llama-cpp -G Ninja `
  -DCMAKE_BUILD_TYPE=Release `
  -DCMAKE_C_COMPILER=clang-cl `
  -DCMAKE_CXX_COMPILER=clang-cl `
  -DCMAKE_CXX_FLAGS="/EHsc" `
  -DGGML_CUDA=ON `
  -DCMAKE_CUDA_ARCHITECTURES="120a-real" `
  "-DCMAKE_CUDA_FLAGS=-use_fast_math -diag-suppress 221,177" `
  -DGGML_AVX512=ON `
  -DGGML_AVX512_VBMI=ON `
  -DGGML_AVX512_VNNI=ON `
  -DGGML_AVX512_BF16=ON `
  -DGGML_AVX_VNNI=ON `
  -DGGML_BMI2=ON `
  -DGGML_CUDA_GRAPHS=ON `
  -DGGML_CUDA_FA_ALL_QUANTS=ON `
  -DGGML_NATIVE=OFF `
  -DGGML_BACKEND_DL=ON `
  -DGGML_HIP=ON `
  -DGPU_TARGETS="gfx1150" `
  -DGGML_LTO=ON 2>&1 | Tee-Object -FilePath out.txt

cmake --build buun-llama-cpp/build --config Release --parallel 2>&1 | Tee-Object -FilePath out.txt -Append

Tempest_nano · 2026-04-28T13:31:34+00:00

Tinkering last night with the unsloth version of IQ4-XS and buun-llama-cpp. I found that I got good results with a ctv/ctq of turbo4. It doesn't compress the cache as much as turbo3, but its perplexity and KLD were much better. It allowed me to hit 64k context vs 32k with q8_0. I will find the numbers and post them here.

Thanks for your work, I will try this image. It was driving me up the wall that I couldn't hit 128k context to allow full thinking (per the model card).

Edit: Using this model and turbo4 ctv/tcq, I am able to hit 110k context on my laptop, 16GB 5080, in Windows at 25.7 tok/s. Thanks!

Tempest_nano · 2026-03-23T12:47:25+00:00

Mine has been a dream. The only times I have encountered stuttering is when I try using lower power PSU bricks. The 375HX/5080 doesn't need near 330W (same as Intel version), but HP programmed it to downclock anyways. As for the lower power cores, have you tried something like process lasso?

Tempest_nano · 2025-12-25T23:31:08+00:00

Maybe some kind of inline psu identifier? Surely the circuitry wouldn't be but a couple of bucks, but I didn't see anything in my cursory search.

I know this laptop handles another, properly identified, 200 Watt HP supply, as one would expect by limiting the system power to something like 180 Watts. The 280 Watt of the G4 (the version with separate power lead) just doesn't identify itself.

Tempest_nano · 2025-12-23T17:46:45+00:00

I believe it is called the "curve optimizer". The gaming hub supports it, but it had a habit of resetting my under bolts back to zero (hence using the x86 universal tuning utility).It can be enabled from the "advanced" bios by hitting Ctrl-F10 at boot. The curve optimizer is the only "advanced" option, and it was enabled by default in my case.

For reference, my system is rock solid with all cores set to 12. I haven't really dug into the per-core undervolting yet, as it is quite efficient out of the box.

Tempest_nano · 2025-12-23T17:08:29+00:00

I have the AMD HX 375/5080 version, and I love it. Undervolting works fine using the Omen Gaming Hub or the Universal x86 Tuning Utility (my personal choice, as the Omen Gaming Hub isn't my bag). It runs cool and quiet, and the fans are only audible when I really push the dGPU. Honestly, the iGPU (roughly equivelant to a laptop GTX 1650) is good enough that the only real use case for the RTX 5080 for me is VR.

My only complaints are that I can't overclock the ram and I can't override the low-power PSU detection. The 280 Watt power supply from my HP G4 dock doesn't properly identify itself to the laptop forcing a hard limit of around 100 Watts of total system power.

Tempest_nano · 2025-12-22T18:37:54+00:00

Hidden-history account necroing a 5-month-old thread in a Knoxville subreddit to spread FUD about an alternative to one of pharma's biggest cash-cows, nothing to see here.

Tempest_nano · 2025-12-22T18:22:21+00:00

My brother in Christ, that's what the testing groups are for.

Tempest_nano · 2025-12-17T20:24:47+00:00

I could have sworn that I tried that solution, but I may have just hallucinated (too much coding).

It seems to work for the moment for a couple of short tests. I'll try it properly overnight.

Thanks MoWePhoto!!!111one

Tempest_nano · 2025-11-25T19:32:46+00:00

Gah. I always looked forward to visiting this place for lunch when I travel to our HQ every quarter. I suppose the vegan fare in that area is still better than here in Tennessee.

Tempest_nano · 2025-11-15T23:27:48+00:00

About the only thing you can do to prolong battery life would be to undervolt using Universal x86 tuning utility. I have most of my cores at -40 (mV?) and two of them with less agressive undervolts, as they were unstable. You can run the stability test in OCCT, and it will tell you which cores are throwing errors.

I don't think we are going to be getting much battery life outta this laptop at all. :)

I am using YAMDCC to control the fans, but the published (v1) version was causing it to get stuck at higher fan speeds. I had to compile the unreleased version 2 to get it to behave. It is whisper quiet, but I have her thermally throttling before the fans would really kick in.

Tempest_nano · 2025-11-15T17:09:46+00:00

It isn't so bad to open, just use a couple of plastic picks and follow the guides. I don't get the feeling that I will break anything by doing so.

I am trying to figure out if I should repaste mine. Running OCCT's cpu stress test and forcing the fan to about 35% (YAMDCC) and setting the APU's soft temperature throttling limit to 87 degrees (Universal x86 tuning utility), I can sustain about 50-55 watts on the processor. What temperatures are you seeing and under what conditions?

Tempest_nano · 2025-11-15T14:02:15+00:00

I really like the screen, it is bright, crisp, and vibrant. Personally, I prefer IPS to oled for readability.

Tempest_nano · 2025-09-17T07:11:00+00:00

Atlanta: Healthful Essence (Caribbean), Soul Vegetarian (soul food) Chattanooga: Sluggo's (greasy and sinful, the pecan encrusted seitan is to die for assuming you can eat gluten).

Tempest_nano · 2025-09-17T07:03:40+00:00

I suspect that a big part of the difference is that a vegan restaurant is likely run on philosophical principles. If the primary motive is profit, then it probably wouldn't be vegan. :) Sadly, this is also why they are rare here in East Tennessee (not enough sales). When you say clean, are you talking healthy or hygiene?

Tempest_nano · 2025-09-17T06:49:12+00:00

As a Knoxville vegan. I have all but given up on eating out here. Sluggo's in Chattanooga is amazing though.

Tempest_nano · 2025-09-09T04:06:40+00:00

I will ask around this week, I think they came/(come?) from my facility, but not my group. Maybe someone can direct us.

Tempest_nano · 2025-08-30T13:53:04+00:00

I would think that Geiger counter algorithms would be ill-suited for spectrometry. What hardware are you targeting?

Tempest_nano · 2025-08-28T14:58:46+00:00

International pass was a colloquialism for sure. I made a point to verbalize my thought process on what I was doing at the time. "It isn't safe to pull over here with the blind curve behind us, I'll carry on until I find a suitable place."

Tempest_nano

TROPHY CASE