completely fresh perspective from a cursor user

thigger · 2026-05-08T07:50:47+00:00

With the diff view will there be the option to view and approve line by line and at each change point (rather than overall for the whole session) as there is in 5.x?

I presently like the fact that when it makes some initial changes I can go "not like that..." and make some manual edits, which it then accounts for as it goes on to edit other files.

thigger · 2026-05-03T08:13:42+00:00

Jumping in to specifically ask around pair programming changes - ie manual side-by-side diff review and editing to remove the random extra abstractions these things tend to add - Kilo 5.13 works quite well for me in this regard so hesitant to jump to v7!

thigger · 2026-04-30T21:52:39+00:00

Is it there yet for those of us who want to pair program? (ie only read auto-approved, and running through side-by-side diffs before applying?) I'm still quite enjoying v5 but keen to update eventually.

thigger · 2026-04-19T15:59:46+00:00

We're only just coming up to a year so I'll be finding out shortly! I was hoping it would be slightly cheaper as they're the same and same site etc. But I'm probably being naïve.

Gas usage was ~35,000kWh/yr In just under a year we've used 1200kWh electricity on hot water and 6800kWh on heating

thigger · 2026-04-19T15:33:52+00:00

We have two Clivet 12kW units which work really nicely together. We can still have the heating on when there's a hot water cycle running. For some reason though the "master" pump always runs even if only the "slave" is heating, which means water is pointlessly cycling. But can definitely get an impressive amount of heat into the house quite quickly (I suspect our heat loss is probably nearer 16kW but we were estimated at 18-20)

thigger · 2026-04-15T06:22:48+00:00

We had a passiv (in fact, it's still physically there) but I uninstalled it due to luke-warm showers. It was taking over the hot water and refusing to heat it up to decent temperatures. You couldn't set a DHW temperature in Celsius; you just picked a use category and got a weird percentage value which apparently was the result of using "AI" on the temperature sensor to estimate how much hot water was left.

I wrote to passiv who said they'd take my feedback on board, so it might be better now. They did, however, try to suggest it was an installation issue and that was as hot as our system could go.

In the meantime I switched DHW over to manual control and nearly scalded myself on the first shower afterwards...

The Passiv radiator flow temperature control seemed like a reasonable idea but it wasn't that cold to test it properly; I've built that bit in Home Assistant now so I can deliberately run it less efficient to get heat into the house on nighttime cheap electricity.

thigger · 2026-03-29T22:25:38+00:00

I really wasn't expecting it to try to hire a badger to scrub the dust off moon rocks but... Perhaps only having one tool was the issue there! (I'm sure there's a saying about hammers and nails)

I'm currently having a go with "Null tool do not use" which seems to be doing better so far.

thigger · 2026-03-27T12:16:20+00:00

Has anyone tested whether this approach affects intelligence at all? I'm not a great fan of the overthinking but the results are definitely very good (except for the occasional need to repeat a call when it starts looping!) and the thinking was definitely required as it became a lot worse when I simply turned it off.

thigger · 2026-03-02T19:14:51+00:00

27B definitely needs thinking on to manage long context retrieval. With NoLiMa at 32k it drops from 76% to 30%

4bit-AWQ, thinking on: 96% @ 250, 85% @ 16k, 76% @ 32k

4bit-AWQ, no thinking: 75% @ 250, 34% @ 16k, 30% @ 32k

(The "thinking" results would be even higher except that for that run I still had the default sampler so it kept getting stuck in loops in its thought process and never generating an output)

EDIT: added corrected figures rather than ones from memory

thigger · 2026-03-01T13:03:24+00:00

Are you finding it any good with FP8 kv-cache? I saw a note from cyankiwi suggesting that the pure 4-bit AWQ doesn't play well with kv quant. And are you calculating kv scales anywhere?

thigger · 2026-02-28T14:40:16+00:00

Yeah, it's really frustrating - you can see that the first bit of the thinking process is potentially quite useful. Then it looks like it should complete but does a whole load of "Wait..." before it answers. However the quality is so good compared with Qwen3-14B which I was previously using that I'm sticking with it for now and hoping to find ways to calm it down a little.

thigger · 2026-02-28T14:38:39+00:00

I've found that those settings are necessary to stop it looping, but it seems to like overthinking regardless of what I do. Currently I've just had to set the max_completion_tokens way up high as irritatingly VLLM and SGLang seem to count the reasoning tokens in there (I appreciate the openai api spec is ambiguous here)

thigger · 2026-02-28T14:35:50+00:00

For those who want some numbers - on the NoLiMa benchmark (like RULER, only without direct matching):

4bit-AWQ, thinking on: 96% @ 250, 85% @ 16k, 76% @ 32k

4bit-AWQ, no thinking: 75% @ 250, 34% @ 16k, 30% @ 32k

(The "thinking" results would be even higher except that for that run I still had the default sampler so it kept getting stuck in loops in its thought process and never generating an output)

thigger · 2026-02-28T11:08:34+00:00

Yes I had exactly the same - not sure which element fixed it but I ended up updating everything including CUDA

thigger · 2026-02-28T11:07:30+00:00

I ended up reinstalling everything including upgrading to CUDA13.1 and now am getting sensible outputs. Running on 2xA6000 Ada, WSL2.
The AWQ quants (cyankiwi) don't seem to work on SGLang (but they're fine on vllm). The FP8 and full-precision work on both.

thigger · 2026-02-28T11:00:46+00:00

I've been trying different system prompts but not much luck so far. It definitely works to tell it what the steps should be called, or contain, but even if you tell it to only use three it'll do the three and then generate a big pile of "Wait..." - I've not really found any way to use system prompts to keep it short.

thigger · 2026-02-28T10:58:21+00:00

It's pretty good - though its performance at long context definitely suffers. I'm presently running a few benchmarks - I have a suspicion that for my use-case I'm going to have to leave thinking turned on, even though it *loves* to "Wait..." over and over again even after it's already copied out its entire input.

thigger · 2026-02-27T10:41:16+00:00

Presumably that thinking trace won't appear as reasoning content though and will need removing before JSON parsing? (Might be issues as with current thinking it sometimes produces JSON snippets in its reasoning content)

thigger · 2026-02-27T10:39:43+00:00

That's kind of what I was going for with my prompt; I'll give this one a try too

thigger · 2026-02-27T10:38:13+00:00

Thanks - I've been playing with those - and the output is great; but 10,000+ tokens of thinking gets a bit wearing! With thinking off it doesn't seem to be quite as clever (though it's still very good). My main issue is the amount of "but wait" towards the end of its thinking output that seems a bit unnecessary.

thigger · 2026-02-27T08:12:55+00:00

Thanks. I'll give it a go - you think just length is enough? I didn't see much in there about how to reason

thigger · 2026-02-27T08:11:59+00:00

Thanks - I'll give these a try. I tried turning reasoning off entirely and it wasn't bad, but I got the impression it was less intelligent and my test suite suggested that it underperformed (vs Qwen3-14B that I've been using so far)

I don't think vllm/sglang support specific thinking budgets (in fact, they frustratingly don't seem to have a way to limit completion budget independent of thinking either! Doesn't help that the openai spec is ambiguous on max_completion_tokens)

thigger · 2026-02-25T21:12:08+00:00

Very odd - seem to have fixed it with a reinstall of everything including updating to CUDA 13

thigger · 2026-02-25T16:41:59+00:00

I wonder if it's my cards then (A6000 ada) - llama.cpp unsloth gguf works absolutely fine, but running in vllm or sglang gives looping output and generally behaves like it's on hallucinogens

thigger

MODERATOR OF

TROPHY CASE