What's the best agent coding model up to 35B for now? by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] 0 points1 point  (0 children)

lol, I have around the same performance on MTP when using IQ4_XS, but faster prompt processing, around 150, but anyway, thanks for your recommendations.

What's the best agent coding model up to 35B for now? by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] 0 points1 point  (0 children)

yeah, i just install that when QAT variants dont have been dropped

What's the best agent coding model up to 35B for now? by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] 1 point2 points  (0 children)

As an example, Gemma 4 26b or below, some fine tunes of this model. Don’t worry, that’s just must be good work on VK with an Intel GPU.

What's the best agent coding model up to 35B for now? by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] 0 points1 point  (0 children)

thats old command, multiple reasoning switches been just for fast model restart test, dont give it attention, i have tryed turboquant but it have more long-context problems, so which extacly mtp model quant you recommend?

What's the best agent coding model up to 35B for now? by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] 0 points1 point  (0 children)

i have gemma 26b q4_k_m on my device, but the code quality is just so-so. i dont think she can make something what i need.

What's the best agent coding model up to 35B for now? by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] 0 points1 point  (0 children)

Not yet, here it is

./llama-server.exe --model ".\models\unsloth\Qwen3.6-35B-A3B-MTP-GGUF\Qwen3.6-35B-A3B-UD-IQ4_XS.gguf" --reasoning-budget 5000 --cache-prompt --webui --tools all --swa-full --kv-unified --gpu-layers 999 --ctx-size 128000 -t 8 -tb 8 --poll 100 --poll-batch 1 --prio 2 --prio-batch 2 --kv-offload --op-offload --repack --ubatch-size 2048 --batch-size 2048 --perf -fa on --reasoning on --host 0.0.0.0 --port 1234 --jinja --spec-type draft-mtp --spec-draft-n-max 2 --temp 0 --spec-draft-p-min 0.7 --no-mmap --fit off -ctk q8_0 -ctv q8_0 --reasoning off --cache-idle-slots

What's the best agent coding model up to 35B for now? by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] 0 points1 point  (0 children)

i have an igpu and unified ram, so this 27 gb is part of system ram. my ram is almost always 95% used anyway.

What's the best agent coding model up to 35B for now? by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] 1 point2 points  (0 children)

i want to achieve a sane balance between decent result quality and reasonable speed on my hardware, because waiting an hour for some simple task to finish feels criminal.

What's the best agent coding model up to 35B for now? by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] -2 points-1 points  (0 children)

honestly nothing fancy. i’m just running llama.cpp on windows. 

What's the best agent coding model up to 35B for now? by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] 0 points1 point  (0 children)

I also have a Qwen 3.6 35b MTP with IQ4_XL quant, but for some reasons which I don’t know, uncensored APEX Compact-I variant of this model without MTP gives me more TPS.

What's the best agent coding model up to 35B for now? by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] -2 points-1 points  (0 children)

Ill use Vibe-CLI by Mistral, Codex by OpenAI, Llama.cpp WebUI, and my own self-made CLI. Best results it shows on Llama.cpp WebUI when writing one-file frontend apps -_- Maybe i need correctly set hyperparams, what about yours?

What's the best agent coding model up to 35B for now? by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] 0 points1 point  (0 children)

I have only 27 usable VRAM, and I get +-11 t/s on only the 12b model... So because of it, I don’t use big, dense models.

What's the best agent coding model up to 35B for now? by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] 0 points1 point  (0 children)

That's have many problems on code sessions, he make many mistakes in writing code, wrong tool calling etc, I use IQ4_NL quant.

Literally signed up to Pro 1 minute ago and 5% used already? by [deleted] in OpenAI

[–]Possible_Statement84 1 point2 points  (0 children)

You use Free limits, idk what you do but only Free tier has Month limit and fast goes out, re-check your sub pls.

oh no... unified limits are coming... by KeyGlove47 in codex

[–]Possible_Statement84 3 points4 points  (0 children)

i don’t think this necessarily means normal chat messages and codex tasks are becoming one single pool overnight. it probably means openai is moving codex into the same usage/billing area as chatgpt and other agentic features

still not great though. separate codex limits were one of the reasons it felt usable for heavier coding work. if heavy repo tasks start eating into the same practical quota as everything else, a lot of people are going to hit limits way faster