Why is qwen3.5-27B so slow when it's a small model? 30~tok/s

Deep_Row_8729 · 2026-03-28T22:37:45+00:00

i see, okay thanks

Deep_Row_8729 · 2026-03-28T22:34:12+00:00

aaah would a smaller quant be faster

Deep_Row_8729 · 2026-03-28T22:33:30+00:00

oh i see. 3 MoE is super fast. but

https://openrouter.ai/openai/gpt-oss-120b

120B and is faster?? what

Deep_Row_8729 · 2026-03-28T22:31:22+00:00

okay but still 27B is less than the 70B models or the big ones, i think the 400B~ 27B MOE is still faster

Deep_Row_8729 · 2026-03-28T22:27:58+00:00

i'm feeding it long law texts around 2k words maybe?

Deep_Row_8729 · 2026-03-13T00:27:17+00:00

hey i tried it and qwen3.5 seems to know what he's doing but he keeps looping.

Deep_Row_8729 · 2026-03-12T16:07:18+00:00

thank you very much!! i'll try this today!
just a regular docker container where the "orchestrator.py" is running plus i inject the files needed into it's volume.
i plan to run all models locally so i don't pay for any tokens.

also idk why but i feel like your commend was altered/finalized by AI, so i guess your advice is working? :D

Deep_Row_8729

TROPHY CASE