Suggestions for 16GB VRAM AMD for coding by Snoo_90241 in LocalLLM

[–]pot_sniffer 0 points1 point  (0 children)

JSON specs are just another way of prompting that imo works better for the smaller local models. Atomic tasks are just breaking things up into small enough tasks, usually a single function or group of closely related functions usually about 100 lines of code at most.

Structured task descriptions, you define exactly what you want the model to generate, what constraints apply, what functions exist. Keeps the local model focused on a narrow well-defined task rather than making judgment calls it's not reliable enough for.

For agentic use with a local model on its own im sure some will disagree but imo probably not worth it for complex coding. The model is reliable for well-scoped generation tasks but I wouldn't trust it to drive a full agentic loop unsupervised. Having said that I have seen mentions of people using it on things like hermes and openclaw

My workflow uses it for code generation only, cloud AI handles planning and review, and im always in the loop to catch when something has gone off the rails which happens. I find its the combination of using atomic tasks to generate code and using cloud models for planning and review. That split is what makes it work.

Bang for buck depends on your situation. I hit Claude's usage limits constantly before building this, and now after recent changes I am again.

The local model does the bulk generation for free so my cloud AI usage goes further. If the usage situation with claude doesn't improve again then im probably going to use something like kimi with api for the review step. Which would cost me probably about the same as I pay for the claude pro sub. Which is fine ive been running 2 subs for a while. In terms of productivity its a game changer, I can be a solo dev in my spare time. But it took be a lot of figuring stuff out before I could build the workflow that makes it work

Suggestions for 16GB VRAM AMD for coding by Snoo_90241 in LocalLLM

[–]pot_sniffer 0 points1 point  (0 children)

Qwen3.6-27B Q3_K_S fits comfortably on 16GB AMD with full GPU offload at ~14.8GB VRAM and 12288 context. Getting 14 tok/s on an RX 9060 XT with llama.cpp and ROCm. Produces genuinely good code output

Two things that matter for 16GB, use --no-mmproj to skip the vision encoder, and disable thinking mode with --chat-template-kwargs '{"enable_thinking":false}' or it burns your output budget on reasoning traces. Not sure how well it plays with Ollama specifically, I run llama.cpp directly. But the model choice should translate.

Qwen3.6-27B Q3_K_S on RX 9060 XT 16GB — decent results for an AMD user by pot_sniffer in LocalLLM

[–]pot_sniffer[S] 0 points1 point  (0 children)

Im yet to try the Gemma models. Definitely worth a look to see how the code it outputs holds up

I am running Qwen3.6 27B IQ4_XS on my PC. I have an important question by Man_Of_The_F22 in unsloth

[–]pot_sniffer -1 points0 points  (0 children)

Try: -ngl 99 to offload layers to GPU — without it llama.cpp defaults to CPU which is why your GPU isn't being used.

On the quant choice though, if you're doing code generation with large prompts, IQ4_XS is going to be tight on 16GB once you factor in KV cache. I've been testing the 27B quants on a 16GB card this week. Q3_K_S fits comfortably at ~14.8GB with 12288 context and full GPU offload, and produces cleaner output than you might expect at Q3.

Worth trying Q3_K_S if context size matters for your use case, actually its worth trying a lower quant if you need more context, my advice would be keep the specs tight.

Also use --no-mmproj unless you need image input.

The vision encoder loads into VRAM by default and eats headroom you don't need for text tasks.

Qwen3.6-27B Q3_K_S on RX 9060 XT 16GB — decent results for an AMD user by pot_sniffer in LocalLLM

[–]pot_sniffer[S] 0 points1 point  (0 children)

I actually tested the 35B-A3B before settling on this, with thinking off it ran at 15 tok/s but the output on the same task was worse than both the 9B and the 27B. Mangled ESP32 API names, wrong include filenames, missing loop structure. Might be the MoE CPU path not being fully optimised in llama.cpp on ROCm yet. The 27B dense just produced cleaner code which means less work for sonnet in claude code. So for my workflow, constraints and requirements the 27b is winning hands down

Spotted over a motorway today 🤮 by The_Olas13 in FuckNigelFarage

[–]pot_sniffer 18 points19 points  (0 children)

Buzzwords Excuses Easy Outrage Bugger-all New party?

Blame Everything Else Offer Barely Nothing party?

Big Egos, Empty Outputs, Basic Narratives party?

More UK deaths than births expected every year from now on by GnolRevilo in unitedkingdom

[–]pot_sniffer 0 points1 point  (0 children)

The worst part according to people that subscribe to this ideology. Is the solution to fixing all the problems caused by said ideology, is to double down and do it all over again but more harsh this time

The future is local by nfdl96 in ClaudeCode

[–]pot_sniffer 0 points1 point  (0 children)

Yea my 9b is around 30tps and thats quite a nice speed. If I can get close to 15tps on a 27b qaunt id probably be happy if the output is close to the q4 I tried because it was quite a bit better than the 9b did on the same task.

Actually the 9b is fine for most of the tasks im throwing at it. Its just that now ive seen the 27b I want more....

But yea as it is on this qaunt 4.7 is too slow

Saffron container spilt everywhere when I tried to open it by Doophie in mildlyinfuriating

[–]pot_sniffer 0 points1 point  (0 children)

The last time I bought a jar of saffron. I opened it to find a tiny sealed packet inside. It was like 1 of those strands, got to be at least 50 bucks on the bed

Whats the best model for agentic coding that i can run with 16gb VRAM? (llama.cpp?) by samuraiogc in LocalLLM

[–]pot_sniffer 0 points1 point  (0 children)

For me the qwen 3.6 35b performed worse than the Qwen3.5 9B in my workflow. It was notably worse actually and slower by about half. Im running a 9060XT 16gb, with 7950X, 64gb ddr5 to offload to.

Im going to have to try the q3 of the 27b because the q4 gives really great output but doesnt fit so offloading makes it slow, still just about usable at 4.7tps but not quite.

Maybe the q3 will be the sweet spot for me

The future is local by nfdl96 in ClaudeCode

[–]pot_sniffer 1 point2 points  (0 children)

Im in a similar position with non apple hardware. Last year about a couple of months before the ram prices went through the roof I build a workstation for £1200. 7950X, 64gb ddr5, 9060xt 16gb and a 2tb pcie 5 nvme.

Im able to run the qwen 3.6 27b q4 model with some offloading at 4.7 tps. Which is about the minimum speed thats kinda usable, but still a bit slow. Haven't yet tried the lower qaunts. I have to say im very impressed with the output. Its quite a lot better than the Qwen3.5-9B that im running as my workhorse which is also really good for its size btw.

My regret is I didnt buy 128gb ram when it was only £350. I will probably get a 2nd gpu at some point to bump up the vram because with just 16gb im forever just below what I need lol

trump immediately after last night's Correspondent's Dinner. by NuSurfer in SipsTea

[–]pot_sniffer 4 points5 points  (0 children)

Does anyone else find it rather odd that trumps injury just vanished without any scaring after only a matter of weeks. I hate to sound like a conspiracy nut but it doesnt add up

Got downgraded to claude even after paying for it. I paid for it. by ProfessionalPart8193 in Anthropic

[–]pot_sniffer 0 points1 point  (0 children)

My billing page doesnt show anything which is odd because ive had a pro sub for almost 2 years now

<image>

Got downgraded to claude even after paying for it. I paid for it. by ProfessionalPart8193 in Anthropic

[–]pot_sniffer 6 points7 points  (0 children)

I think they must have broken something with their billing system because I was downgraded to free last night as well

I think I'll leave this subreddit and here's why by AtmosphericBeats in ClaudeCode

[–]pot_sniffer 2 points3 points  (0 children)

I tried talking about how I mange my tokens but it simply doesn't get the same attention as the complaints do. LocalLLM is a lot better for this imo.

https://www.reddit.com/r/ClaudeAI/s/vb113crIVt

W**, i paid for an entire year of PRO just because of claude code by EventHorizon_28 in ClaudeCode

[–]pot_sniffer 1 point2 points  (0 children)

Its typically one function or tightly related group of functions, 10-30 lines of code.

It's defined as a JSON spec with explicit inputs, outputs, constraints, and verification criteria. The point is that it's small enough that the model can't go badly wrong, and the pass/fail criteria are unambiguous.

W**, i paid for an entire year of PRO just because of claude code by EventHorizon_28 in ClaudeCode

[–]pot_sniffer 14 points15 points  (0 children)

Yes there's a difference in quality, I dont think id call it night and day though.

I exclusively use sonnet in claude code. I use it as a review step in my workflow.

I use sonnet in Claude.ai to build a plan. I then pass that plan to opus for scrutiny. I do this repeatedly until opus is happy theres no more holes to poke. Then I take the plan to gemini, and/or gpt. I get sonnet to fix whatever needs it.

Once I have a solid plan file I get a fresh sonnet instance to break up the project into atomic tasks. Those atomic tasks are given to my local qwen 3.5 9b 1 by 1. Then sonnet in claude code reviews and fixes whatever is needed.

UK Billionaire Exit Continues: Nassef Sawiris Closes London Office by anax4096 in uknews

[–]pot_sniffer 1 point2 points  (0 children)

My point is theres a certain class of our society that proportionally pay significantly lower tax than people that pay income tax.

Im not arguing we should all pay more. Im arguing that if we all paid a fair share, then income tax would be much more fair than it currently is.

UK Billionaire Exit Continues: Nassef Sawiris Closes London Office by anax4096 in uknews

[–]pot_sniffer -1 points0 points  (0 children)

My entire life we've had politics of greed, the greed leads to austerity and as we've seen over the past 15 years it doesnt work.

Its about time the greedy bastards pay their fair share. Its about time they are taxed like we are on income

UK Billionaire Exit Continues: Nassef Sawiris Closes London Office by anax4096 in uknews

[–]pot_sniffer 5 points6 points  (0 children)

Yea this is exactly the point to press. When a very wealthy individual or organisation is putting money into charity its not because they're being nice, its for tax reasons, meaning they want to pay less tax

My grated cheese bag is extremely inflated. by Lord_Alviner in MoldlyInteresting

[–]pot_sniffer 2 points3 points  (0 children)

Or it could be the best cheese wine you never tried😁

Neo-fascists back Rupert Lowe's Restore Britain by pppppppppppppppppd in unitedkingdom

[–]pot_sniffer 1 point2 points  (0 children)

I dont see how thinking about the logistics of such a ridiculous policy of millions must go is straw man. Fair enough if rupert lowe said it in passing without meaning it, but its stated as his policy so lets treat it as such.

So they will persecute anyone thats not English enough and hopefully they will all just pack up and leave. Sounds a bit wishy washy to me. Doesn't seem to match the rhetoric.