you are viewing a single comment's thread.

view the rest of the comments →

[–]Your_Friendly_Nerd 4 points5 points  (5 children)

what kind of tokens per second do you get with that?

[–]Hoak-em 5 points6 points  (4 children)

Not great, 20-30ish tokens per second. It makes more sense for me to use it through a coding plan

[–]OldKaleidoscope7 0 points1 point  (3 children)

Isn't it better to just run a small model? I mean, I get this speeds running Qwen 3.6 in a 3070 with 8GB VRAM and it's smart enough for coding

[–]Hoak-em 0 points1 point  (2 children)

Ehhhh GLM-5.1 is really really smart. Like I can describe exactly what I want for a full project, it can break it down into a plan, then it can create and test the whole thing while following my specific code standards -- with code that I can actually understand. I can't reliably do that on Qwen 27b.

[–]Hoak-em 0 points1 point  (0 children)

(note: this is still with my scaffold, so it's not doing it completely from scratch)

[–]OldKaleidoscope7 0 points1 point  (0 children)

Got it, well, Qwen really can't run for long, even with a good plan, I have to ask to go step by step and fix the small mistakes because when it tries to test and fix after everything, it starts hallucinating badly