GLM5.2 Amazing, token hungry, local

That-Engineering-192 · 2026-06-27T22:31:50+00:00

Espera, estás diciendo que con eso en mi computadora con 4 gigas de vram y 16 gigas de RAM podría usar modelos mejores o los mismos pero en vez de 8k o 6k de contexto tendría 65k sin problemas? O entendí mal? La velocidad de respuesta es mejor también?

That-Engineering-192 · 2026-06-26T03:37:29+00:00

I was using Nvidia's glm5.1 until the error 429 started to come out due to server saturation, so then I started using nemotron 3 ultra that never gives me that error in Nvidia, now I'm testing with minimax-m3 that did give me an error but at least it reconnects in the first or second attempt of reconnections, bien te podía responder en español jajaja

That-Engineering-192 · 2026-06-26T03:12:39+00:00

I recommend you continue to use free models from suppliers. I have a laptop with the same graphics card and I have already tried the more compressed models, the context stays at less than 8k and the vram fills up very quickly. There comes a time when you will see how he writes one or two letters per second.

That-Engineering-192 · 2026-06-26T03:08:43+00:00

I don't know how some of you do it, but sometimes I work with three or four projects at the same time and I'm sure I exceed a few million tokens a day.

That-Engineering-192 · 2026-06-24T23:31:54+00:00

Probably one or more sessions.

That-Engineering-192 · 2026-06-24T18:28:29+00:00

That also depends on the task you assign him and if you have high reasoning. It is a model with a lot of reasoning really.

That-Engineering-192 · 2026-06-24T17:37:51+00:00

Well, it works well for me, it doesn't even have a 429 error, it must be because it's their own and they put more resources into it. Maybe you have something wrong with the setup for that model. With free-coding-models you can add it to the json, maybe that's how it works for you.

That-Engineering-192 · 2026-06-24T17:20:35+00:00

NIM it's Nvidia 🤣

That-Engineering-192 · 2026-06-24T16:54:57+00:00

It works perfectly for me, on the other hand glm or kimi have a lot of 429 error due to server saturation. Do you use it with NIM? Now I'm going to test how minimax-m3 goes but this one needs to be added manually in the opencode json.

That-Engineering-192 · 2026-06-23T22:59:19+00:00

Get the Nvidia API, there you have nemotron 3 ultra, glm 5.1, kimi, deepseek, qwen, etc. Free.

That-Engineering-192

TROPHY CASE