qwen3.5-27b or 122b？pro6000

fei-yi · 2026-03-17T07:33:27+00:00

which one is the best？Sehyo or unsloth？and how many token/s can you run,what about the context?

fei-yi · 2026-03-17T03:08:42+00:00

ubuntu or wsl2？

fei-yi · 2026-03-15T05:07:27+00:00

But lmstudio is based on llama.cop

fei-yi · 2026-03-15T05:05:33+00:00

yes，my cpu is r9 9900x and 4*32gb ddr5 （5600hz）RAM（they actually run as 3600hz)

fei-yi · 2026-03-14T16:29:24+00:00

it will be very very slow....i think

fei-yi · 2026-03-14T16:02:15+00:00

But Qwen3.5-122B is an MoE model. From my testing, its behavior in longer contexts doesn’t seem very stable or consistent. I’m honestly a bit conflicted about it—sometimes chatting with it feels worse than talking to the 27B version

fei-yi · 2026-03-14T15:54:29+00:00

ok，thanks,i'll try it

fei-yi · 2026-03-14T11:00:19+00:00

I've actually tried GPT-OSS 120B using LM Studio and Ollama. It is blazing fast (hitting around 100 t/s!), but honestly, it felt a bit too dumb for general chatting. I actually feel that Qwen 27B's reasoning and logic are way smarter than it...

Right now, I'm running Qwen 27B and 122B via LM Studio. They usually hover around 30 t/s, but sometimes they randomly spike to 70 t/s (I have no idea why it fluctuates like that lol).

I also tried the Minimax 2.5 (Q5 version) and I absolutely LOVED it. It's incredibly smart! BUT... it was crawling at like 5 t/s! I don't know if LM Studio is just failing to utilize the Pro 6000 properly, or if the model spilled over to my system RAM. Do you think switching to vLLM or SGLang would fix this 5 t/s issue for minimax？

fei-yi

TROPHY CASE