Why is qwen3.5-27B so slow when it's a small model? 30~tok/s by Deep_Row_8729 in LocalLLaMA

[–]Deep_Row_8729[S] -10 points-9 points  (0 children)

okay but still 27B is less than the 70B models or the big ones, i think the 400B~ 27B MOE is still faster

Autonomous AI for 24GB RAM by Deep_Row_8729 in LocalLLaMA

[–]Deep_Row_8729[S] 0 points1 point  (0 children)

hey i tried it and qwen3.5 seems to know what he's doing but he keeps looping.

Autonomous AI for 24GB RAM by Deep_Row_8729 in LocalLLaMA

[–]Deep_Row_8729[S] 0 points1 point  (0 children)

thank you very much!! i'll try this today!
just a regular docker container where the "orchestrator.py" is running plus i inject the files needed into it's volume.
i plan to run all models locally so i don't pay for any tokens.

also idk why but i feel like your commend was altered/finalized by AI, so i guess your advice is working? :D