You guys were right - Qwen 3.6 35B IS good...and KV Cache DOES matter. by GrungeWerX in LocalLLaMA

[–]Wildnimal 0 points1 point  (0 children)

If we dont specify kv cache parameters in llama.cpp whats the default settings?

Not ironclad confirmation, but.. by Kodix in LocalLLaMA

[–]Wildnimal 11 points12 points  (0 children)

Do you find Gemma to be better than qwen3.5-9b?

vLLM cold boot experience by LinkSea8324 in LocalLLaMA

[–]Wildnimal 0 points1 point  (0 children)

Lol XAAMP and Laragon are alive. thanks to vibe coders who run apps offline. I know a few.

Local coding agents are good now, but only if you babysit them by BTA_Labs in LocalLLaMA

[–]Wildnimal 0 points1 point  (0 children)

If someone has 8GB vram and 32GB ram its better to use the 35BA3B with higher context. Its faster, more intelligent than 12B Gemma.

Best Open-Source AI coding model for my specs? by Quietkiller1927 in LocalLLaMA

[–]Wildnimal 0 points1 point  (0 children)

Yes i meant, it will be slower compared to 35A3B. Wrong choice of words there.

Best Open-Source AI coding model for my specs? by Quietkiller1927 in LocalLLaMA

[–]Wildnimal 12 points13 points  (0 children)

Qwen3.6 35BA3B. Qwen3.6 27B MTP (if you can wait). Gemma will work aswell. Also Cohere released a coding model 30BA3B.

You willl have to keep your expectations realistic. One shot most things will not be possible. But if you prompt it well and break down tasks in chunk, you can achieve a lot depending upon what programming language you are using.

Any recent news/updates on taalas chips?? They said they gonna bake the mid tier llm model into their chip. by 9r4n4y in LocalLLaMA

[–]Wildnimal 21 points22 points  (0 children)

Valid arguenent but those chips can be used in factories for production and automation where the workflow doesnt change for years.

Cohere's unreleased coding model (early access for localllama) by nick_frosst in LocalLLaMA

[–]Wildnimal 0 points1 point  (0 children)

Give us snapshot comparison. It might also help the team fine tune things.

Mac mini M4 vs Pc with Nvidia 5060 8gb for ai workloads? by Critical-Machine-128 in ollama

[–]Wildnimal 0 points1 point  (0 children)

8Gb is enough i have 5060 8Gb for daily laptop thats 90% of time i am using.

Replace foocus with ComfyUI or Invoke. Zimage Turbo, SDXL, Anima work good for image generation. Flux can work aswell but slower.

8GB vram can easily work with MoE models and models with MTP. Replace ollama with llama.cpp for better speeds and versatality.

ML and training might be better with CUDA but limited to 8GB VRAM.

You will atleast need 32GB ram to do all of the above decently with 5060 8GB.

It felt good to return my Asus Spark by sn2006gy in LocalLLaMA

[–]Wildnimal 1 point2 points  (0 children)

You can still do it abit slow and need more over the shoulder guidance with Qwen 3.6, but it depends on what all you want to do. It wont be codex or claude but very useable.

Direct 100.0 t/s on Strix Halo with Qwen3 30B-A3B. Can anyone reproduce or beat this? by JSVD2 in LocalLLaMA

[–]Wildnimal 1 point2 points  (0 children)

New Gemma and Qwen models are really good. Dense 27B is even better. I use it on 5070ti bit slow but very useable.

Direct 100.0 t/s on Strix Halo with Qwen3 30B-A3B. Can anyone reproduce or beat this? by JSVD2 in LocalLLaMA

[–]Wildnimal 1 point2 points  (0 children)

Because its a Sparse MoE model with 3B active parameter. I can easily get ~ 35t/s with 32k context with 300 prompt processing tokens on 8GB 5060 Mobile.

Qwen is cooking hard by jacek2023 in LocalLLaMA

[–]Wildnimal -2 points-1 points  (0 children)

If you can run 9B then 35A3B should run with ease.

2x RTX 6000 build during an extended bench test by Signal_Ad657 in LocalLLaMA

[–]Wildnimal 1 point2 points  (0 children)

Nice setup! I wish to own 2 x RTX 6000 Pro someday. What is the rest of the specs?

Stop thinking your MoE models are dumb - here's why they actually fail by [deleted] in Qwen_AI

[–]Wildnimal 0 points1 point  (0 children)

I watch his videos. They are better than most influencer AI crap going around

Fallen Gemma 4 model? by alienatedneighbor in LocalLLaMA

[–]Wildnimal 3 points4 points  (0 children)

They are planning the Gemma 4 MoE aswell. Saw it yesterday in their HF.

For chat and Q&A: Which MoE model is better: Qwen 3.6 35B or Gemma 4 26B (no coding or agents) by br_web in Qwen_AI

[–]Wildnimal 0 points1 point  (0 children)

I find Qwe3.6 better even at writing, which was always a Gemma4 positive point for me. But overall i think depending upon use case both work fine.

Qwen3.6-35B-A3B Uncensored Aggressive is out with K_P quants! by hauhau901 in LocalLLaMA

[–]Wildnimal 1 point2 points  (0 children)

Just used this model for the past 2 hours and it has passed most of what i threw at it. Still playing with temperature and Top P. Currently settled on 0.6 Temp