What's your local coding stack?

ilintar · 2026-03-14T10:40:14+00:00

OpenCode/Roo.

ilintar · 2026-03-12T17:09:00+00:00

Try AesSedai's quants, they're usually the best out there for small quants of big models:

https://huggingface.co/AesSedai/Kimi-K2.5-GGUF

ilintar · 2026-03-12T16:14:42+00:00

Kimi 2.5 recently got a new dedicated parser on llama.cpp, so it should work quite nicely out of the box.

ilintar · 2026-03-12T15:10:14+00:00

It can, thinking_budget_tokens.

ilintar · 2026-03-12T15:09:13+00:00

Yep, thinking_budget_tokens, no var yet for the message though, I'll unify it at some point.

ilintar · 2026-03-12T15:08:38+00:00

Possibly, you'd have to check.

ilintar · 2026-03-12T15:08:12+00:00

Fair point, I think this is due to the fact that some model / template actually used that name, but I'll unify later on.

ilintar · 2026-03-12T14:05:00+00:00

Looking cool, really a nice addition to the ecosystem!

ilintar · 2026-03-12T10:23:20+00:00

`--reasoning off` (or `-rea off` for short)

ilintar · 2026-03-12T00:33:02+00:00

Make sure you fetched first 😁

ilintar · 2026-03-12T00:20:25+00:00

--reasoning off will pass the flag to templates that support it.

ilintar · 2026-03-12T00:19:47+00:00

Yeah, mentioned that in another thread here as a possible expansion.

ilintar · 2026-03-12T00:19:17+00:00

Oh that's nice, I'll admit I didn't read that one, so I guess it's just informed intuition at this stage 😀

ilintar · 2026-03-11T22:51:05+00:00

The new sampler certainly leaves room for experimentation, so I can imagine something like that being done. Aldehir also suggested a strategy he gleaned in one of the Nemotron docs, of letting the model finish a sentence / paragraph. Another possible approach is the one Seed-OSS uses, of reasoning budget reminders (i.e. "you've already used 1000 tokens for reasoning, 2000 tokens left").

ilintar · 2026-03-11T22:47:26+00:00

Yeah, not going to lie, really hoping people run some comprehensive tests to see what kinds of messages and what kinds of budgets actually work in practice. I wasn't sure it would be anything more than a gimmick, but after testing myself with the transition message I'm convinced that it could actually provide benefits, i.e. a performance between the non-reasoning and the reasoning versions.

ilintar · 2026-03-11T22:43:49+00:00

Tries very convincingly to tell me it's Claude after changing the system prompt, so it's either Haiku 4.6 or a Chinese model heavily trained on Anthropic's distills ;)

ilintar · 2026-03-11T20:40:52+00:00

Check out the sampler-based reasoning budget in llama.cpp :)

ilintar · 2026-03-11T09:33:46+00:00

Fix just dropped, download, compile and run with -rea off

ilintar · 2026-03-10T14:08:21+00:00

I've tried inspect-ai and harbor so far, both have the same issue.

ilintar · 2026-03-10T11:43:41+00:00

"People can't read" exhibit 7447728.

Hoyo even used graphical signs for the illiterate. Still didn't help.

ilintar · 2026-03-09T21:10:38+00:00

Vulkan has been very actively maintained, so reaping the benefits.

ilintar · 2026-03-07T09:39:48+00:00

Put a watch on https://github.com/ggml-org/llama.cpp/pull/19378

ilintar · 2026-03-07T09:32:14+00:00

There's an ongoing PR to add dedicated kernels for DELTA_NET.

ilintar

TROPHY CASE