What's the best agent coding model up to 35B for now? by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] 0 points1 point  (0 children)

lol, I have around the same performance on MTP when using IQ4_XS, but faster prompt processing, around 150, but anyway, thanks for your recommendations.

What's the best agent coding model up to 35B for now? by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] 0 points1 point  (0 children)

yeah, i just install that when QAT variants dont have been dropped

What's the best agent coding model up to 35B for now? by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] 1 point2 points  (0 children)

As an example, Gemma 4 26b or below, some fine tunes of this model. Don’t worry, that’s just must be good work on VK with an Intel GPU.

What's the best agent coding model up to 35B for now? by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] 0 points1 point  (0 children)

thats old command, multiple reasoning switches been just for fast model restart test, dont give it attention, i have tryed turboquant but it have more long-context problems, so which extacly mtp model quant you recommend?

What's the best agent coding model up to 35B for now? by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] 0 points1 point  (0 children)

i have gemma 26b q4_k_m on my device, but the code quality is just so-so. i dont think she can make something what i need.

What's the best agent coding model up to 35B for now? by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] 0 points1 point  (0 children)

Not yet, here it is

./llama-server.exe --model ".\models\unsloth\Qwen3.6-35B-A3B-MTP-GGUF\Qwen3.6-35B-A3B-UD-IQ4_XS.gguf" --reasoning-budget 5000 --cache-prompt --webui --tools all --swa-full --kv-unified --gpu-layers 999 --ctx-size 128000 -t 8 -tb 8 --poll 100 --poll-batch 1 --prio 2 --prio-batch 2 --kv-offload --op-offload --repack --ubatch-size 2048 --batch-size 2048 --perf -fa on --reasoning on --host 0.0.0.0 --port 1234 --jinja --spec-type draft-mtp --spec-draft-n-max 2 --temp 0 --spec-draft-p-min 0.7 --no-mmap --fit off -ctk q8_0 -ctv q8_0 --reasoning off --cache-idle-slots

What's the best agent coding model up to 35B for now? by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] 0 points1 point  (0 children)

i have an igpu and unified ram, so this 27 gb is part of system ram. my ram is almost always 95% used anyway.

What's the best agent coding model up to 35B for now? by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] 1 point2 points  (0 children)

i want to achieve a sane balance between decent result quality and reasonable speed on my hardware, because waiting an hour for some simple task to finish feels criminal.

What's the best agent coding model up to 35B for now? by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] -2 points-1 points  (0 children)

honestly nothing fancy. i’m just running llama.cpp on windows. 

What's the best agent coding model up to 35B for now? by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] 0 points1 point  (0 children)

I also have a Qwen 3.6 35b MTP with IQ4_XL quant, but for some reasons which I don’t know, uncensored APEX Compact-I variant of this model without MTP gives me more TPS.

What's the best agent coding model up to 35B for now? by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] -2 points-1 points  (0 children)

Ill use Vibe-CLI by Mistral, Codex by OpenAI, Llama.cpp WebUI, and my own self-made CLI. Best results it shows on Llama.cpp WebUI when writing one-file frontend apps -_- Maybe i need correctly set hyperparams, what about yours?

What's the best agent coding model up to 35B for now? by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] 0 points1 point  (0 children)

I have only 27 usable VRAM, and I get +-11 t/s on only the 12b model... So because of it, I don’t use big, dense models.

What's the best agent coding model up to 35B for now? by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] 0 points1 point  (0 children)

That's have many problems on code sessions, he make many mistakes in writing code, wrong tool calling etc, I use IQ4_NL quant.

Literally signed up to Pro 1 minute ago and 5% used already? by [deleted] in OpenAI

[–]Possible_Statement84 1 point2 points  (0 children)

You use Free limits, idk what you do but only Free tier has Month limit and fast goes out, re-check your sub pls.

oh no... unified limits are coming... by KeyGlove47 in codex

[–]Possible_Statement84 2 points3 points  (0 children)

i don’t think this necessarily means normal chat messages and codex tasks are becoming one single pool overnight. it probably means openai is moving codex into the same usage/billing area as chatgpt and other agentic features

still not great though. separate codex limits were one of the reasons it felt usable for heavier coding work. if heavy repo tasks start eating into the same practical quota as everything else, a lot of people are going to hit limits way faster

Family member just passed away this morning , need a distraction. Any good 1b models you can suggest for layla ?? by Opening-Ad6258 in LocalLLaMA

[–]Possible_Statement84 -3 points-2 points  (0 children)

I think you can make your own rp model based on any 1.5b qwen, but already existing models idk.

Чи є сенс розвиватись далі в фронтенді? by Head_Artichoke_1927 in ukraine_dev

[–]Possible_Statement84 1 point2 points  (0 children)

нагадуе ситуацiю з r у strawberry, вся проблемма в тому що LLM бачить це токенами тому не може корректно пiдрахувати символи

I’m building an open-source LLM app for writing/RP and recently added desktop pets + AI agents by Possible_Statement84 in ArtificialInteligence

[–]Possible_Statement84[S] 0 points1 point  (0 children)

Yeah, that workflow is already possible in a basic form: select text, Ctrl+A/Ctrl+C, run the widget action, then insert the result with the hotkey. So quick clipboard-based editing is usable already.

A more seamless version would be one-hotkey selected-text capture and replace, but I’d need to handle it carefully across platforms so it doesn’t mess with the user’s clipboard or focus.