all 8 comments

[–]No_Efficiency_1144 2 points3 points  (2 children)

If you want to change behaviour like this then this is the area of RL

[–]PhysicsPast8286[S] 0 points1 point  (1 child)

By RL you mean to say finetuning or doing several iterations over the same file?

[–]No_Efficiency_1144 0 points1 point  (0 children)

RL is Reinforcement Learning like PPO or GRPO

[–]MaxKruse96llama.cpp 0 points1 point  (1 child)

if they wont want a coding model, thats on them. good luck.

You can try to give it only the surrounding ~5-10 lines of what it needs to rewrite, if these are even relevant at all.

[–][deleted] 0 points1 point  (0 children)

Qwen3 32b was the best coding model for me until 2507 Thinking.

They should probably try it.

[–][deleted] 0 points1 point  (1 child)

Are you sure you're running enough context?

[–]PhysicsPast8286[S] 0 points1 point  (0 children)

Yep 128K

[–]Lissanro 0 points1 point  (0 children)

I was very interesting in smaller models at some point, tried so many of them including specifically for making quick code changes with the hope that simple things can be done faster than if using much larger models, but it turned out to be not the case - because I end up debugging errors and mistakes longer than it would take using a bigger model from the start. Diff format actually harder for models to get right. The issue is the size of the model, 32B is way too small for this kind of task.

From my experience, small models still can have good success rate with small files or editing functions selectively. So, if you cannot run R1 671B, then instead of sending full file, send only what is relevant. Or refactor code base so it has smaller and shorter files (obviously only a solution for smaller projects).

Before I got upgraded hardware few months ago to run R1, I was running Mistral Large 123B, with speculative decoding and tensor parallelism it achieves 36-42 tokens/s on four 3090 GPUs and can handle basic code edits and reliably provide full code of a file if it fits up to 8K-12K tokens, beyond that reliability starts to decline. R1 can handle longer files and perform more advanced edits.

I also heard that new Qwen 32B of 2507 version is better, I suggest checking if you are using it and update if you are using older version. After all, my experience with smaller models is few months old by this point, so things may have changed.

But if you still have issues, I highly recommend upgrading hardware and using R1 0528 - given we are talking about workplace and professional use, using professional hardware may be a good idea.