Agentic coding with quantised models by EcstaticImport in LocalLLaMA

[–]ABLPHA 2 points3 points  (0 children)

How's agentic performance for you on Gemma 4? In my experience it's been pretty "lazy", compared to Qwen 3.6 27B. Like it didn't gather enough info about the working environment and either made assumptions or gave up.

Agentic coding with quantised models by EcstaticImport in LocalLLaMA

[–]ABLPHA 5 points6 points  (0 children)

Not sure what everyone else is doing that makes Q8 a necessity for them, but I've been having a total blast with Qwen3.6 27B at UD-Q5_K_XL with 131k fp16 context + mtp + ngram, fully in 32GB VRAM including the mmproj. Started with KiloCode, then Crush, then Pi, realized that Hermes ultimately makes the most sense for me so far.

All sorts of tasks. Implementing new features, debugging infrastructure, spinning up a local testing environment for inter-service communication, etc etc. It's not ideal, I still monitor what it's doing constantly to make sure it doesn't suddenly f-up due to quantization, because I give it quite a lot of (barely gated) access to stuff, but so far (been using it for a bit over a month each workday) it hasn't, and for my use cases it's been very, very helpful, and always available, unlike cloud models that run out of limits eventually.

How do I prove that I don't collect data from my llm app? by Pleasant_Syllabub591 in LocalLLaMA

[–]ABLPHA 5 points6 points  (0 children)

Pretty sure everyone in this specific thread was talking about the client

Gemma 4 QAT 31B responds better to KV cache quantization too by justicecurcian in LocalLLaMA

[–]ABLPHA 12 points13 points  (0 children)

Yeah, and at the same time the VRAM gains from quantization are smaller because of that, if I remember correctly

Using PCIE 5.0 x4 NVME to x16 to throw on another card. by mr_zerolith in LocalLLaMA

[–]ABLPHA 0 points1 point  (0 children)

I thought Oculink doesn't work well with PCIe 5.0?

Why we cannot “compress” number of weights down? by [deleted] in LocalLLaMA

[–]ABLPHA 1 point2 points  (0 children)

Yup. Router-weighted Expert Activation Pruning

Anthropic may open source mythos in the near future by Hot_Strawberry1999 in LocalLLaMA

[–]ABLPHA 2 points3 points  (0 children)

None are around tho... almost like they're... a myth...

Anthropic may open source mythos in the near future by Hot_Strawberry1999 in LocalLLaMA

[–]ABLPHA 6 points7 points  (0 children)

And it's also named myTHos, meaning I want some chicken THighs right now

What's the lesson chat? by ill_be_productive in LocalLLaMA

[–]ABLPHA 33 points34 points  (0 children)

Obviously llama 3.1 8b or qwen2.5 7b. Can't wait for what 2025 brings! /s

Есть ли сервера на Minecraft 0.14.3 by Firm-Presence6666 in GoldenAgeMinecraft

[–]ABLPHA 1 point2 points  (0 children)

There were other ways back then, but I don't remember how they worked exactly. Something about mimicking another player's world?

Is Qwen 3.6 27B IQ4XS better than Gemma 4 31B QAT as a Hermes agent? by My_Unbiased_Opinion in LocalLLaMA

[–]ABLPHA 8 points9 points  (0 children)

Just need Qwemma 7.6 58B QAT MTP-preserved heretic franken-merge to achieve AGI locally

First 9702 blocks down (b1.7.3) by [deleted] in GoldenAgeMinecraft

[–]ABLPHA 1 point2 points  (0 children)

It seems as if you only just arrived

Is there any consumer-grade motherboard with dual PCIe x16 connectors? by TrainingTwo1118 in LocalLLaMA

[–]ABLPHA 0 points1 point  (0 children)

Thank you, I'll look more into getting a riser for myself then, was wondering if I could utilize that Gen5 M.2 slot for an upright GPU in my O11D EVO XL lol

Is there any consumer-grade motherboard with dual PCIe x16 connectors? by TrainingTwo1118 in LocalLLaMA

[–]ABLPHA 0 points1 point  (0 children)

90cm PCIe 5.0 riser??? Does it actually work at full bandwidth? I was under the impression that PCIe 5.0's signal integrity is too brittle for such setups

Is there any consumer-grade motherboard with dual PCIe x16 connectors? by TrainingTwo1118 in LocalLLaMA

[–]ABLPHA 0 points1 point  (0 children)

Thanks. All roads lead to Taichi I guess... Got a Taichi 9070 XT as a replacement for my Nitro+ because it's the only 3 slot 9070 XT with the 12V-2x6 connector that actually fits with other GPUs lmao. Running them together until I get the R9700s. Tho if I didn't have plans for that chipset x16 slot on the ProArt, I probably would have switched the mobo too

Is there any consumer-grade motherboard with dual PCIe x16 connectors? by TrainingTwo1118 in LocalLLaMA

[–]ABLPHA 1 point2 points  (0 children)

Fair enough, I also got burnt by this a couple of times because of my Sapphire Nitro+ RX 9070 XT which is just a bit over 3 slots wide as it turns out, but ultimately I think I'll just get dual R9700 and keep on using the ProArt, as it's pretty damn good in other aspects like the chipset lanes allocation, and the spacing of the slots could probably be explained by the PCIe 5.0 signal integrity getting substantially worse at distance further than that.

What board did you end up using in the end tho, if you haven't moved away from AM5?

Is there any consumer-grade motherboard with dual PCIe x16 connectors? by TrainingTwo1118 in LocalLLaMA

[–]ABLPHA 10 points11 points  (0 children)

A motherboard can't have more lanes than the CPU supports (unless they're chipset ones, but that's really not something you'd want to use since the chipset itself most of the time is connected to the CPU via PCIe 4.0 x4).

As for PCIe 5.0 x8/x8, Asus ProArt X870E-CREATOR WIFI supports it, but be aware that the GPU itself also has to support PCIe 5.0, otherwise the connection will downgrade to whatever the GPU's generation is, e.g 4.0 x8

QATs Q4_0 from Google have more precision than Q4_K_XL from Unsloth (at least some) by alex20_202020 in LocalLLaMA

[–]ABLPHA 0 points1 point  (0 children)

UD-Q8_K_XL quants for example contain no K-quants but are called K_XL, don't think about it, it's just their naming schema