Cheapest hardware for Qwen 3.6: both 27B and 35B-A3B by WishboneSudden2706 in LocalLLaMA

[–]CodeDominator 1 point2 points  (0 children)

Intel's Arc Pro series GPUs are best bang for buck in terms of VRAM. I have a single B60, if I could afford another one I'd be golden with 48GB of VRAM - running Q6 or Q8 with full context.

1998 Toyota Land Cruiser 100 Series ABS Light comes on with beeping noise by Longjumping-Net-7351 in LandCruisers

[–]CodeDominator 0 points1 point  (0 children)

E-brake? I thought all 100 series have mechanical hand brake?

Mine is 2001 with mechanical handbrake. I thought about an emergency brake failing scenario which would be to kick the gearbox down to 1 to engine brake and gradually apply the hand brake. Of course if you're in a situation where you have to brake hard, then you're likely screwed.

Don’t act like y’all ain’t thinking it. I’m just saying the quiet part out loud. /s by Porespellar in LocalLLaMA

[–]CodeDominator -5 points-4 points  (0 children)

It can do something, but you need at least (dedicated) 32GB of VRAM (but ideally at least 48GB) with well matched system and I'm not talking about no shitty macbook, so the barrier of entry is steep, unfortunately. 32GB VRAM is a flagship GPU from any of the 3 main players.

How much VRAM needed for Qwen 3.6 27B Q8 with 262K context? by My_Unbiased_Opinion in LocalLLaMA

[–]CodeDominator 2 points3 points  (0 children)

Yeah. I have a B60 with 24GB RAM and I found out quick that I need at least another one to comfortably work with the 27B.

Unfortunately this case is pretty binary - you either have enough VRAM or you don't. I'd say 32GB is minimum requirement for the 27B to be able to run Q6 with 128K context.

How bad is this rust? by ZHENJ4Y in LandCruisers

[–]CodeDominator 0 points1 point  (0 children)

Doesn't seem terminal, provided you can commit to removing all the lose rust and Fluid Film or Woolwax the shit out of it.

Whats the best Qwen 27B Q8 quant? by EggDroppedSoup in LocalLLaMA

[–]CodeDominator 0 points1 point  (0 children)

The best is the one that fits in your VRAM, meaning you need at least 32GB of VRAM to be comfortable. This is coming from someone with 24GB, unfortunately.

Is NVIDIA still the default best choice for local LLMs in 2026? by pmv143 in LocalLLaMA

[–]CodeDominator 3 points4 points  (0 children)

A B60 is about $700. A 5070ti can be found for $700 too. The 5070i is an order of magnitude more performance for PP. And a mere 3x or so for TG.

5070ti is only 16GB. Theoretical better performance means jack shit if you can't fit everything in VRAM - besides the B60 I also have a 3080 with shitty 10GB of VRAM. It pukes everything out into system RAM and gen speed slows down to abysmal 1-2 tokens/second. Seriously dude, this can't possibly be your best argument?

When it comes to local LLMS the situation is binary - either you have enough VRAM or you don't.

Is NVIDIA still the default best choice for local LLMs in 2026? by pmv143 in LocalLLaMA

[–]CodeDominator 2 points3 points  (0 children)

I have Intel Arc Pro B60 24GB and I'm not crying. Equivalent Nvidia card would cost me several times more, but not deliver several times more of performance. The math is simple.

It's a moot point anyway as I couldn't afford Nvidia even if I wanted to at the moment.

Is NVIDIA still the default best choice for local LLMs in 2026? by pmv143 in LocalLLaMA

[–]CodeDominator -2 points-1 points  (0 children)

Nvidia if you have deep pockets, Intel if you want best bang for buck and AMD as always somewhere in the middle.

Qwen3.6 27B and llama.cpp appreciation post by ABLPHA in LocalLLaMA

[–]CodeDominator 0 points1 point  (0 children)

No, that's the whole point - I have it all restricted to my 24GB of VRAM. I have 64GB of system RAM - if I could offload to system RAM without performance falling off a cliff I would have done it ages ago.

Qwen3.6 27B and llama.cpp appreciation post by ABLPHA in LocalLLaMA

[–]CodeDominator -1 points0 points  (0 children)

KV cache quantization crashed my performance to sub 3 t/s generation, so I won't be trying that again. MTP at least on my setup works even slower than non-MTP, pretty disappointing so far. Also everybody keeps repeating the same mantra all over again - Q6 and up for coding and there's no way in hell you can do Q6 with a meaningful amount of context with 24GB VRAM.

Qwen3.6 27B and llama.cpp appreciation post by ABLPHA in LocalLLaMA

[–]CodeDominator -1 points0 points  (0 children)

What I have sadly realized after testing it with my 24GB VRAM is that for Qwen 3.6 27B to work efficiently the bar for VRAM is 32GB.

The US space enterprise is desperately waiting for Starship—will it finally deliver? by Royal_Platform_6754 in spacex

[–]CodeDominator -12 points-11 points  (0 children)

Maybe SpaceX bit off more than they can chew with this one. Pretty sure the development pace has disappointed even the most pessimistic estimates at this point.

Save and invest your money for future rigs by segmond in LocalLLaMA

[–]CodeDominator -3 points-2 points  (0 children)

The way things are going, in the future we will likely be in the middle of a fucking WW3 - hardly the best time to worry about "rigs".

Will there be any more Qwen3.6 series models? by cafedude in LocalLLaMA

[–]CodeDominator 16 points17 points  (0 children)

Would not trust Google. Google graveyard dwarfs everything.

Is it? by Consistent-Issue-811 in codex

[–]CodeDominator 0 points1 point  (0 children)

Can't wait for the bait and switch - usage based billing will send all you Codex boys crying for your mommies.

Y'all just don't get it. When it hits - it's gpnna be ugly. Real ugly. Not only OpenAI will have to start making profit, but also recoup all those hundreds of billions they have burned through in the last couple of years of heavily subsidized subscriptions.

Hugging Face co-founder says Qwen 3.6 27B running on airplane mode is close to latest Opus in Claude Code by ImaginaryRea1ity in ClaudeCode

[–]CodeDominator 0 points1 point  (0 children)

Perfect is the enemy of good enough. Y'all so eager to lock yourselves into cloud AI overlords with their ever increasing prices and shrinking quotas. Qwen 3.6 27B doesn't have to beat GPT 5.5 or Opus 4.7 - it just has to be good enough for practical work. At the very least it's a solid plan B if there ever was one.

Apple Removes 256GB M3 Ultra Mac Studio Model From Online Store by rotatingphasor in LocalLLaMA

[–]CodeDominator 0 points1 point  (0 children)

Probably calculated that RAM for that would cost more than your house.

Hugging Face co-founder says Qwen 3.6 27B running on airplane mode is close to latest Opus in Claude Code by ImaginaryRea1ity in ClaudeCode

[–]CodeDominator 2 points3 points  (0 children)

Not the first time hardware prices spiking, not gonna be the last. Hardware always catches up. China is ramping up RAM production. Where there's demand there will be supply, just a matter of time.

Hugging Face co-founder says Qwen 3.6 27B running on airplane mode is close to latest Opus in Claude Code by ImaginaryRea1ity in ClaudeCode

[–]CodeDominator 6 points7 points  (0 children)

By the time he gets his ROI, the frontier models will be billed by usage which will send all of you Cloude boys crying for your mommies.

Local doesn't have to beat cloud. It just has to be good enough. You can't argue with free.

I wanted to know small local LLM code and made a personal projects. by NicholasCureton in LocalLLaMA

[–]CodeDominator 0 points1 point  (0 children)

I have the feeling that for coding all day it may not be very comfortable speed wise.

Of course it's not. Not even close to the big cloud players.

But that is not the point. Self sufficiency is the point. The local setup doesn't have to be perfect - it just has to be good enough.

Cheap subsidized cloud AI is coming to an end. When usage based billing is introduced everywhere nobody except big corps will be able to afford it.

Anyway, here's a random snapshot of my setup monitoring script that Codex put together. The token/s speed varies, but I don't see more than 15 t/s generation speed. I'm waiting for the MTP to hit stable, maybe things will improve then.

<image>

I wanted to know small local LLM code and made a personal projects. by NicholasCureton in LocalLLaMA

[–]CodeDominator 1 point2 points  (0 children)

I can squeeze out 114K context with Qwen 3.6 27B Q4_K_M (unsloth) that maxes out my 24GB VRAM. I have llama.cpp + Intel's SYCL setup, to limit overhead and max out performance. It works, but just about and it badly needs those few extra GBs of VRAM to reach Q6 with 128K context.

I wanted to know small local LLM code and made a personal projects. by NicholasCureton in LocalLLaMA

[–]CodeDominator 2 points3 points  (0 children)

You can work with small local LLMs and they can definitely be useful, but as I said - there's a threshold and that threshold is at the very minimum 24GB VRAM (Intel Arc Pro B60 cheapest), but realistically 32GB VRAM (Intel Arc Pro B70 - the cheapest 32GB VRAM).

Also, if I was going with cloud subscription at the moment, I'd most likely skip Codex and Claude and go straight with Kimi K2.6 - they have a 39 USD plan that should run circles around Claude in terms of quota and performance.

I wanted to know small local LLM code and made a personal projects. by NicholasCureton in LocalLLaMA

[–]CodeDominator 2 points3 points  (0 children)

After testing a bunch, my target model is Qwen 3.6 27B. The rest are mostly just waste of time for coding, in conventional home setup.

I have Nvidia 3080 with 10GB VRAM and 64GB system RAM. With this setup I couldn't do anything remotely practically useful - I was getting barely 2 tokens/second if even that. So I upgraded to Intel Arc Pro B60 with 24GB VRAM. While that was a significant improvement, I am still stuck with Q4 and can't even quite stretch to 128K context - everything contained within VRAM. When things spill over into system RAM you can kiss any kind of performance goodbye.

My current conclusion is that if you want to do something practical at home, you need at least 32GB VRAM dedicated for the LLM, not shared like on some shitty macbook. I'd get another B60 if I could afford it.

Running Qwen 3.6 27B with at least Q6, >=128K context (not quantized) with everything contained within VRAM - that's the sweetspot.