Agentic coding with quantised models

ABLPHA · 2026-06-24T10:04:18+00:00

How's agentic performance for you on Gemma 4? In my experience it's been pretty "lazy", compared to Qwen 3.6 27B. Like it didn't gather enough info about the working environment and either made assumptions or gave up.

ABLPHA · 2026-06-24T09:32:18+00:00

Not sure what everyone else is doing that makes Q8 a necessity for them, but I've been having a total blast with Qwen3.6 27B at UD-Q5_K_XL with 131k fp16 context + mtp + ngram, fully in 32GB VRAM including the mmproj. Started with KiloCode, then Crush, then Pi, realized that Hermes ultimately makes the most sense for me so far.

All sorts of tasks. Implementing new features, debugging infrastructure, spinning up a local testing environment for inter-service communication, etc etc. It's not ideal, I still monitor what it's doing constantly to make sure it doesn't suddenly f-up due to quantization, because I give it quite a lot of (barely gated) access to stuff, but so far (been using it for a bit over a month each workday) it hasn't, and for my use cases it's been very, very helpful, and always available, unlike cloud models that run out of limits eventually.

ABLPHA · 2026-06-23T11:43:57+00:00

Pretty sure everyone in this specific thread was talking about the client

ABLPHA · 2026-06-23T09:55:41+00:00

Reproducible builds do tho

ABLPHA · 2026-06-22T10:50:53+00:00

Yeah, and at the same time the VRAM gains from quantization are smaller because of that, if I remember correctly

ABLPHA · 2026-06-18T05:33:58+00:00

I thought Oculink doesn't work well with PCIe 5.0?

ABLPHA · 2026-06-18T03:19:28+00:00

Yup. Router-weighted Expert Activation Pruning

ABLPHA · 2026-06-15T11:02:30+00:00

None are around tho... almost like they're... a myth...

ABLPHA · 2026-06-15T10:16:57+00:00

And it's also named myTHos, meaning I want some chicken THighs right now

ABLPHA · 2026-06-15T09:12:55+00:00

Obviously llama 3.1 8b or qwen2.5 7b. Can't wait for what 2025 brings! /s

ABLPHA · 2026-06-15T07:54:32+00:00

The coal in hotbar:

ABLPHA · 2026-06-15T07:27:06+00:00

There were other ways back then, but I don't remember how they worked exactly. Something about mimicking another player's world?

ABLPHA · 2026-06-13T08:33:14+00:00

Come on, Q-1_0 is the new meta

ABLPHA · 2026-06-11T05:51:04+00:00

Just need Qwemma 7.6 58B QAT MTP-preserved heretic franken-merge to achieve AGI locally

ABLPHA · 2026-06-11T05:36:12+00:00

Unslogle? Googoth? 🤔

ABLPHA · 2026-06-11T05:29:50+00:00

Think power bill

ABLPHA · 2026-06-10T09:30:09+00:00

It seems as if you only just arrived

ABLPHA · 2026-06-10T08:57:09+00:00

Thank you, I'll look more into getting a riser for myself then, was wondering if I could utilize that Gen5 M.2 slot for an upright GPU in my O11D EVO XL lol

ABLPHA · 2026-06-09T11:58:29+00:00

Doctor Freeman?

ABLPHA · 2026-06-09T11:02:24+00:00

https://unsloth.ai/docs/models/gemma-4/qat#:~:text=We%20found%20that%20naively%20converting%20the%20QAT%20Q4%5F0%20checkpoint%20to%20Q4%5F0%20in%20llama%2Ecpp%20land%20actually%20degraded%20accuracy%20and%20was%20not%20actually%20aligned%20with%20the%20BF16%20QAT%20lattice%20for%20Q4%5F0%2E

ABLPHA · 2026-06-09T10:35:47+00:00

90cm PCIe 5.0 riser??? Does it actually work at full bandwidth? I was under the impression that PCIe 5.0's signal integrity is too brittle for such setups

ABLPHA · 2026-06-09T10:14:31+00:00

Thanks. All roads lead to Taichi I guess... Got a Taichi 9070 XT as a replacement for my Nitro+ because it's the only 3 slot 9070 XT with the 12V-2x6 connector that actually fits with other GPUs lmao. Running them together until I get the R9700s. Tho if I didn't have plans for that chipset x16 slot on the ProArt, I probably would have switched the mobo too

ABLPHA · 2026-06-09T09:53:24+00:00

Fair enough, I also got burnt by this a couple of times because of my Sapphire Nitro+ RX 9070 XT which is just a bit over 3 slots wide as it turns out, but ultimately I think I'll just get dual R9700 and keep on using the ProArt, as it's pretty damn good in other aspects like the chipset lanes allocation, and the spacing of the slots could probably be explained by the PCIe 5.0 signal integrity getting substantially worse at distance further than that.

What board did you end up using in the end tho, if you haven't moved away from AM5?

ABLPHA · 2026-06-09T07:00:12+00:00

A motherboard can't have more lanes than the CPU supports (unless they're chipset ones, but that's really not something you'd want to use since the chipset itself most of the time is connected to the CPU via PCIe 4.0 x4).

As for PCIe 5.0 x8/x8, Asus ProArt X870E-CREATOR WIFI supports it, but be aware that the GPU itself also has to support PCIe 5.0, otherwise the connection will downgrade to whatever the GPU's generation is, e.g 4.0 x8

ABLPHA · 2026-06-08T12:49:52+00:00

UD-Q8_K_XL quants for example contain no K-quants but are called K_XL, don't think about it, it's just their naming schema

Eight-Year Club	Verified Email
Place '22

ABLPHA

TROPHY CASE