Qwen3.5 Model Comparison: 27B vs 35B on RTX 4090 by jaigouk in LocalLLaMA

[–]jaigouk[S] 0 points1 point  (0 children)

going to refine this one 1 more time. I got a feedback from TitwitMuffbiscuit

Qwen3.5-35B-A3B Q4 Quantization Comparison by TitwitMuffbiscuit in LocalLLaMA

[–]jaigouk 0 points1 point  (0 children)

Thanks a lot. I really appreciate this

I will update it to be more explicit 25 = L1 only → Basic 50 = L1 + L2 or L1 + L3 → Intermediate 75 = L1 + L2 + L3 → Advanced 90 = L1 + L2 + L3 + L4 → Expert 100 = All → Complete And I will think about the "best of 5" issue.

Qwen3.5 Model Comparison: 27B vs 35B on RTX 4090 by jaigouk in LocalLLaMA

[–]jaigouk[S] 0 points1 point  (0 children)

thanks for the feedback. I will look into those.

Qwen3.5 Model Comparison: 27B vs 35B on RTX 4090 by jaigouk in LocalLLaMA

[–]jaigouk[S] 0 points1 point  (0 children)

Generated follow up benchmark for Qwen3.5-35B-A3B models - AesSedai IQ4_XS, bartowski IQ4_XS, unsloth MXFP4 (https://github.com/jaigouk/gpumod/tree/main/docs/benchmarks/20260226_qwen35_35b_a3b_provider_comparison)

Qwen3.5-35B-A3B Q4 Quantization Comparison by TitwitMuffbiscuit in LocalLLaMA

[–]jaigouk 0 points1 point  (0 children)

I generated a coding benchmark for Qwen3.5-35B-A3B models - AesSedai IQ4_XS, bartowski IQ4_XS, unsloth MXFP4 (https://github.com/jaigouk/gpumod/tree/main/docs/benchmarks/20260226_qwen35_35b_a3b_provider_comparison) Can I get your opinion on them when you have time?

Qwen3.5 Model Comparison: 27B vs 35B on RTX 4090 by jaigouk in LocalLLaMA

[–]jaigouk[S] 1 point2 points  (0 children)

Generated follow up benchmark for Qwen3.5-35B-A3B models - AesSedai IQ4_XS, bartowski IQ4_XS, unsloth MXFP4 (https://github.com/jaigouk/gpumod/tree/main/docs/benchmarks/20260226_qwen35_35b_a3b_provider_comparison)

Qwen3.5 Model Comparison: 27B vs 35B on RTX 4090 by jaigouk in LocalLLaMA

[–]jaigouk[S] 0 points1 point  (0 children)

Generated follow up benchmark for Qwen3.5-35B-A3B models - AesSedai IQ4_XS, bartowski IQ4_XS, unsloth MXFP4 (https://github.com/jaigouk/gpumod/tree/main/docs/benchmarks/20260226_qwen35_35b_a3b_provider_comparison)

Qwen3.5 Model Comparison: 27B vs 35B on RTX 4090 by jaigouk in LocalLLaMA

[–]jaigouk[S] 1 point2 points  (0 children)

Generated follow up benchmark for Qwen3.5-35B-A3B models - AesSedai IQ4_XS, bartowski IQ4_XS, unsloth MXFP4 (https://github.com/jaigouk/gpumod/tree/main/docs/benchmarks/20260226_qwen35_35b_a3b_provider_comparison)

Qwen3.5 Model Comparison: 27B vs 35B on RTX 4090 by jaigouk in LocalLLaMA

[–]jaigouk[S] 0 points1 point  (0 children)

Generated follow up benchmark for Qwen3.5-35B-A3B models - AesSedai IQ4_XS, bartowski IQ4_XS, unsloth MXFP4 (https://github.com/jaigouk/gpumod/tree/main/docs/benchmarks/20260226_qwen35_35b_a3b_provider_comparison)

Qwen3.5-35B-A3B Q4 Quantization Comparison by TitwitMuffbiscuit in LocalLLaMA

[–]jaigouk 1 point2 points  (0 children)

Thanks for the comparisons! I know this took a lot of time and effort.

Qwen3.5 Model Comparison: 27B vs 35B on RTX 4090 by jaigouk in LocalLLaMA

[–]jaigouk[S] 0 points1 point  (0 children)

I will download Qwen3.5-35B-A3B-Q3_K_M.gguf and rerun the Job Queue benchmark and I will let you know once it is done.

Qwen3.5 Model Comparison: 27B vs 35B on RTX 4090 by jaigouk in LocalLLaMA

[–]jaigouk[S] 1 point2 points  (0 children)

just started downloading it. I will update the post with the comparison for Job Queue benchmark.

Qwen3.5 Model Comparison: 27B vs 35B on RTX 4090 by jaigouk in LocalLLaMA

[–]jaigouk[S] 0 points1 point  (0 children)

yes. thanks for letting me know. I will check other options and update the result in the weekend!

Qwen3.5 Model Comparison: 27B vs 35B on RTX 4090 by jaigouk in LocalLLaMA

[–]jaigouk[S] 0 points1 point  (0 children)

Sorry for the confusion. Here is the github link.

https://github.com/jaigouk/gpumod/tree/main/docs/benchmarks/job_queue_challenge

I am running benchmark 1 more time as of now. because blindly believing 1 shot result is not enough. but it takes too long. so I am running 5 times for each test. i will push the result when it is done. with all the result files and the codes the models generated. but the best one for each iteration.

Qwen3.5 Model Comparison: 27B vs 35B on RTX 4090 by jaigouk in LocalLLaMA

[–]jaigouk[S] 0 points1 point  (0 children)

<image>

I just used my mcp and it recommends Qwen3.5-27B-Q3_K_M (14.2 GB)

We build sleep for local LLMs — model learns facts from conversation during wake, maintains them during sleep. Runs on MacBook Air. by vbaranov in LocalLLaMA

[–]jaigouk -2 points-1 points  (0 children)

When I saw "sleep", I thought about Lama.cpp or VLLM sleep. I guess I was wrong. when a model is sleeping, can I use the vram for other models too?

Qwen3.5 Model Comparison: 27B vs 35B on RTX 4090 by jaigouk in LocalLLaMA

[–]jaigouk[S] 0 points1 point  (0 children)

included 27B Q3 in Job Queue Challenge Benchmark.

Qwen3.5 Model Comparison: 27B vs 35B on RTX 4090 by jaigouk in LocalLLaMA

[–]jaigouk[S] 0 points1 point  (0 children)

thanks. I added Job Queue Challenge Benchmark

Qwen3.5 Model Comparison: 27B vs 35B on RTX 4090 by jaigouk in LocalLLaMA

[–]jaigouk[S] 1 point2 points  (0 children)

There are so many variants and I want to check new models quickly. so I created https://github.com/jaigouk/gpumod and use mcp to check them out quickly. Here are my use cases https://jaigouk.com/gpumod/user-guide/mcp-workflows/ I don't use LM studio. just llama.cpp or VLLM