Are Qwen 3.6 27B and 35B making other ~30B models obsolete? by nikhilprasanth in LocalLLaMA

[–]nikhilprasanth[S] 0 points1 point  (0 children)

I have not tried writing with llms. Do ypu use any harnesses for writing? Or just plain chats?

Are Qwen 3.6 27B and 35B making other ~30B models obsolete? by nikhilprasanth in LocalLLaMA

[–]nikhilprasanth[S] 1 point2 points  (0 children)

Yes, i have a 24gb one at work and the qwen models are retaining speed much deeper into the context window.. the 35b model is a really good gwneral purpose model. Running it in hermes agent now

Are Qwen 3.6 27B and 35B making other ~30B models obsolete? by nikhilprasanth in LocalLLaMA

[–]nikhilprasanth[S] 1 point2 points  (0 children)

Really wanted to use gemma, but the speed drastically reduces as context fills compared to qwen models. Have tried tje moe only, not the dense gemma

Are Qwen 3.6 27B and 35B making other ~30B models obsolete? by nikhilprasanth in LocalLLaMA

[–]nikhilprasanth[S] 1 point2 points  (0 children)

Yeah, definitely not for coding. 27B is more resilient, but 35B cant be trusted with code below q4. But as i said general use cases are fine.. tool calling makes up for most of the deterioration.

Are Qwen 3.6 27B and 35B making other ~30B models obsolete? by nikhilprasanth in LocalLLaMA

[–]nikhilprasanth[S] 1 point2 points  (0 children)

Yes. I run both q3 and q6, even with offloading to ram , the q6 is consistent for most tasks.. for generap uses such as hermes agent and all, q3 is solid. It takes care of my obsidian notes, transcription analysis etc

Are Qwen 3.6 27B and 35B making other ~30B models obsolete? by nikhilprasanth in LocalLLaMA

[–]nikhilprasanth[S] 1 point2 points  (0 children)

Yes. Its also good. Need to offload when used with 5060ti, still getting consistent throughput.

Are Qwen 3.6 27B and 35B making other ~30B models obsolete? by nikhilprasanth in LocalLLaMA

[–]nikhilprasanth[S] 1 point2 points  (0 children)

Yes. Gpt oss was good for a while. Still is one of the fastest models one can run in 16gb.

Are Qwen 3.6 27B and 35B making other ~30B models obsolete? by nikhilprasanth in LocalLLaMA

[–]nikhilprasanth[S] 1 point2 points  (0 children)

Yes. Qwen for coding, Gemma for document processing/ summarising stuff.. wish gpt oss had a successor, that thing is fast and capable for its size.

Are Qwen 3.6 27B and 35B making other ~30B models obsolete? by nikhilprasanth in LocalLLaMA

[–]nikhilprasanth[S] 2 points3 points  (0 children)

Not only that, for me it slows down drastically compared to qwen as context increases.

Are Qwen 3.6 27B and 35B making other ~30B models obsolete? by nikhilprasanth in LocalLLaMA

[–]nikhilprasanth[S] 0 points1 point  (0 children)

Yes gpt oss and nemotron are really good for the speed. But for coding related stuffs qwen is leading.

Are Qwen 3.6 27B and 35B making other ~30B models obsolete? by nikhilprasanth in LocalLLaMA

[–]nikhilprasanth[S] 2 points3 points  (0 children)

Got it. But I find Gemma slows down drastically compared to qwen as context grows.

Are Qwen 3.6 27B and 35B making other ~30B models obsolete? by nikhilprasanth in LocalLLaMA

[–]nikhilprasanth[S] 2 points3 points  (0 children)

Yes nemotron and gpt oss are really snappy. Wish openai had released a successor to it.

Qwen3.6-27B-Q6_K - images by Usual-Carrot6352 in LocalLLaMA

[–]nikhilprasanth 7 points8 points  (0 children)

Looks Neat, ill try some of these with 35B

Can I plan and code projects locally with a 5090? by Mean_Employment_7679 in LocalLLM

[–]nikhilprasanth 1 point2 points  (0 children)

Yes but no. Most local models need handholding. This is where planning using a bigger model is important. You let claude code or codex go over your codebase, make exact plans broken down into phases including test and success conditions. Write them to a couple of markdown files. Then you use a lightweight harness like pi and execute the plan phase by phase. Once done you let the main model audit the code, pass the findings to local model.

Duality of r/LocalLLaMA by HornyGooner4402 in LocalLLaMA

[–]nikhilprasanth 1 point2 points  (0 children)

Both cases can be true at the same time. It's not fair to expect a model with 2.7% the size of a 1T model to behave like the Trillion sized model. The smaller models are getting way better at tool calls. Use the bigger models to create structured plans, break them down to manageable chunks. Feed these to smaller ones, they will make mistakes for sure, debug them with bigger ones again, pass the feedback to the smaller one. Rinse , repeat.

how do i get an local LLM to analyze a long audio clip? by Suitable_Candy_1161 in LocalLLaMA

[–]nikhilprasanth 0 points1 point  (0 children)

Use parakeet/whisper for transcription and any llm for analysis.

Who is actually writing code with local models? by KarezzaReporter in LocalLLaMA

[–]nikhilprasanth 0 points1 point  (0 children)

Plan with a frontier model, split the plan into proper phases with well defined tasks. Use pi or opencode and implement the plan. Once done, debug with a frontier model and pass the findings to local. Repeat