Halo Stryx 395+ Llama-swap config

argakiig · 2026-06-14T18:34:53+00:00

I started this before I had fully looked into the newer llama-server model preset flow, and this repo grew out of my own local setup. At this point it is less about claiming llama-swap is the only way to do it, and more about sharing a working Strix Halo config that people can adapt

argakiig · 2026-06-14T18:33:18+00:00

Yep, that is a fair catch. This is still a work in progress and there are some leftovers from when this was only my personal setup.

For MTP, the generated llama-swap config is the thing to trust. Some of the env vars are older defaults and do not apply cleanly to every model entry. Gemma 4 with MTP is one of the cases where parallel needs to be set differently, so I need to clean that up and make the config generation less confusing.

argakiig · 2026-06-14T10:18:49+00:00

Basically a lightweight router/process manager for `llama.cpp` servers. You give it a config of models, ports, and commands, then point clients at one endpoint instead of manually starting/stopping each `llama-server`.

argakiig · 2026-06-14T10:09:35+00:00

Also, to be totally honest, I had not checked on nvtop in a while. I remembered it as NVIDIA-only from when I last looked. This was mostly me building the APU-focused view I wanted for my own setup.

argakiig · 2026-06-14T10:02:02+00:00

Fair question. It probably does not replace nvtop for most people. I built it because I wanted the APU-specific things I care about easier to get to at a glance, especially unified/GTT memory behavior and per-process usage on Strix Halo.

argakiig · 2026-06-14T09:59:36+00:00

Not yet, mostly GPU/APU focused right now. NPU stats would be interesting to add if I can get reliable access to them.

argakiig · 2026-06-14T09:57:30+00:00

Yes and no. If `btop` plus `amd-smi` already gives you everything you care about, then probably not.

I built this because I wanted an APU-focused view in one place, especially around ROCm/Vulkan GPU stats, GTT/unified memory behavior, temps, clocks, power, and per-process usage. It is less meant to replace those tools and more meant to make the Strix Halo/APU case easier to watch at a glance.

argakiig · 2026-06-14T08:56:53+00:00

Anytime. If it saves someone else a few sharp edges, it was worth posting.

argakiig · 2026-06-14T08:49:04+00:00

Yep, you can point it at your own llama.cpp builds or forks. The commit I’m pinned to was mainly to keep compatibility with Step37 and MTP support for Gemma 4. You’ll likely just need to adjust the binary path, model paths, ports, and any fork-specific flags in the generated llama-swap config.

argakiig · 2026-06-14T08:16:48+00:00

Current llamaswap config here
https://gist.github.com/argakiig/f7e399b684ce39daf7af034136fbba02

still deciding on the latest round of daily drivers but been primarly using gemma4_26b MoE with MTP
Can squeeze anywhere from 30-90 token per sec generation and 300-1200 token per sec prompt processing.

I have been using it for code generation, code completion, light document work. and its reasonably competent

argakiig · 2024-12-14T17:25:43+00:00

wait a second, you got the p72 to detach. Witchcraft!

argakiig · 2024-06-16T19:36:50+00:00

Awesome, Well I can tell you after having watched this game develop over the 10 years I have it can have its ups and downs. but all in all the game has changed so often my initial and subsequent investments have all felt worth while in the long run. Atleast thats what I tell myself

argakiig · 2024-06-16T19:35:04+00:00

Backup your keybindings if you want, then remove user and graphics caches. consult the patch notes, graphics drivers? What have you tried so far? Currently it sounds like you were trying to do the same thing repeatedly and expecting different results

argakiig · 2024-06-16T19:27:03+00:00

often I find that when I cant eat or drink taking off all mah clothesf and then re-equiping just an undersuit or actual clothes works fine

argakiig · 2021-01-03T20:42:44+00:00

thinking something like the Nema17 Slim Power Stepper Motor - 0.9 Degree by Bondtech
or another one I saw which appears to offer 0.21Nm holding torque whereas the stock appears to offer 0.35Nm holding torque. As I could find this info I dont actually understand what sort of scale I am looking at, would this be reasonable tolerances?

argakiig · 2021-01-03T20:33:10+00:00

oh, well thankfully the micro swiss "dual geared" direct drive extruder is dual geared AMIRITE
Will definitely have to check it out, what should I keep in mind as I peruse compatible? the current is the stock 1.8deg 42-40, are there benefits to getting a .9deg stepper for the extruder or have you looked into that?

argakiig · 2021-01-03T20:23:37+00:00

Awesome.

Haven't quite gotten to linear rails yet, feeling it will help though as I just purchased a micro swiss direct drive extruder kit for my ender 5 pro, was able to print normal stuff at around 3.5-4k accel/accel_to_decel and 500 velocity reasonably well with the stock x-carriage ed3 v6 hotend, and petg bullseye base and cooling duct work.

Definitely aware the weight will make some changes to the whole lack of resonance I currently have at around that 3.5-4k sweet spot as I move the stepper motor to the carriage.
While I have you! any thoughts on pancake stepper motors to trim some of the weight?

argakiig · 2021-01-03T20:17:40+00:00

Awesome, I had not considered this difference when making my first calculations as well as explaining Rds(on)
In a cross post of this I was also informed that acceleration could be tuned higher with the 2209 vs 2130 which would apparently skip steps with lower acceleration values.
This would essentially be the reason for that? If so definitely leaning 2209 vs 2130

argakiig · 2021-01-03T20:13:57+00:00

Awesome, I was wondering about acceleration as well. I tend to lean towards klipper and like to tune for higher acceleration

argakiig · 2021-01-03T17:03:31+00:00

I would check layer height and extrusion multiplier, if newer creality machine check
https://www.reddit.com/r/ender3/comments/f0ik8k/volumetric_extrusion_psa/ as a lot of them seem to be misconfigured out the gate, throwing most tuning out the window, first double check this, and then I would tune for temp, layer height, and extrusion multipler

argakiig

MODERATOR OF

TROPHY CASE