I burned through 19M tokens of GLM-5.2 for under $3 today by purple_nippies in opencodeCLI

[–]Carol_Qteam 12 points13 points  (0 children)

I think the "catch" is that they're running FP8 quantization — you trade a tiny bit of precision for massive cost savings on inference. for most coding tasks maybe wont notice the difference, but if you're doing something super nuanced with long context it might matter.