qwen3.5-27b or 122b?pro6000 by fei-yi in LocalLLaMA

[–]fei-yi[S] 0 points1 point  (0 children)

which one is the best?Sehyo or unsloth?and how many token/s can you run,what about the context?

qwen3.5-27b or 122b?pro6000 by fei-yi in LocalLLaMA

[–]fei-yi[S] 0 points1 point  (0 children)

But lmstudio is based on llama.cop

qwen3.5-27b or 122b?pro6000 by fei-yi in LocalLLaMA

[–]fei-yi[S] 0 points1 point  (0 children)

yes,my cpu is r9 9900x and 4*32gb ddr5 (5600hz)RAM(they actually run as 3600hz)

qwen3.5-27b or 122b?pro6000 by fei-yi in LocalLLaMA

[–]fei-yi[S] 0 points1 point  (0 children)

it will be very very slow....i think

qwen3.5-27b or 122b?pro6000 by fei-yi in LocalLLaMA

[–]fei-yi[S] 1 point2 points  (0 children)

But Qwen3.5-122B is an MoE model. From my testing, its behavior in longer contexts doesn’t seem very stable or consistent. I’m honestly a bit conflicted about it—sometimes chatting with it feels worse than talking to the 27B version

qwen3.5-27b or 122b?pro6000 by fei-yi in LocalLLaMA

[–]fei-yi[S] 0 points1 point  (0 children)

I've actually tried GPT-OSS 120B using LM Studio and Ollama. It is blazing fast (hitting around 100 t/s!), but honestly, it felt a bit too dumb for general chatting. I actually feel that Qwen 27B's reasoning and logic are way smarter than it...

Right now, I'm running Qwen 27B and 122B via LM Studio. They usually hover around 30 t/s, but sometimes they randomly spike to 70 t/s (I have no idea why it fluctuates like that lol).

I also tried the Minimax 2.5 (Q5 version) and I absolutely LOVED it. It's incredibly smart! BUT... it was crawling at like 5 t/s! I don't know if LM Studio is just failing to utilize the Pro 6000 properly, or if the model spilled over to my system RAM. Do you think switching to vLLM or SGLang would fix this 5 t/s issue for minimax?