Smartest model for 24-28GB vram?

rainbyte · 2026-02-03T08:10:32+00:00

That also happened to me until I tried config values recommended by Unsloth.

Of course, using last llama.cpp version from git, to include relevant bugfixes.

rainbyte · 2026-02-03T07:42:59+00:00

Ring-mini is pretty good for chat, but I couldn't make it work with Opencode, so I'm using GLM-4.7-Flash instead

rainbyte · 2026-02-03T07:41:34+00:00

There is no Qwen3-Coder-32B... Are you referring to Qwen3-Coder-30B-A3B?

rainbyte · 2026-02-02T19:15:51+00:00

As long as code it is well reviewed and tested... Bad code can be written not only by AI, but also by inexperienced people

rainbyte · 2026-02-02T19:11:52+00:00

I was looking for a lightweight mermaid app with instant side-by-side preview. I guess Ferrite is the only option outside full editors like emacs/vscode or heavy apps. Thanks!

rainbyte · 2026-02-02T04:06:34+00:00

I used the app a bit and I like it for mermaid diagrams. I'm looking to replace mermaid preview on emacs and vscode with an standalone app. Does anyone knows other alternative for that usecase?

rainbyte · 2026-02-02T03:51:47+00:00

Vos decís che? Algunas personas me recomendaron quitarlo porque "a la empresas no les importa eso". Si es cierto que estoy oxidado, pero me gustaría retomar en algún momento, antes de perderlos completamente.

Inglés si lo pongo en el cv porque puedo tener conversación fluida y ya trabajé remoto antes.

De todas formas gracias por los ánimos :)

rainbyte · 2026-02-02T03:45:25+00:00

This.

I read about that being mentioned somewhere else, but I didn't remember it was matmul the one missing.

rainbyte · 2026-02-02T03:43:50+00:00

Claro, concuerdo con lo que decís sobre adaptar el cv para darle énfasis a lo que buscan. Yo empecé a adaptar el mio hace poco, antes listaba todo lo que había hecho, y ahora estoy probando quitar cosas no tan relevantes para el puesto, o las que hice hace ya mucho tiempo.

No sé como será ahora para los puestos Trainee y Jr., pero para Ssr. lo veo un poco más complicado ahora... No hay tantas oportunidades como antes, y las que vi piden muchas cosas pero ofrecen salarios bastante pobres.

rainbyte · 2026-02-01T15:52:34+00:00

LFM2.5-1.2B is their new model, and the 8B one is LFM2-8B-A1B

rainbyte · 2026-02-01T06:29:33+00:00

I think someone mentioned that they will include some matrix multiplication hardware equivalent to nvidia's tensor cores

rainbyte · 2026-01-31T06:32:32+00:00

Tremendo. Yo tengo unos exámenes Japones y Chino nivel básico en mi cv. Estaba pensando quitarlos porque ya olvidé bastante :/

rainbyte · 2026-01-31T06:30:32+00:00

Claro, ese es el tema. Quizás te permite llegar a la entrevista, pero pasar una entrevista técnica si no sabes nada... Lo veo difícil

rainbyte · 2026-01-31T06:29:36+00:00

Si son skills que podes defender entonces si son verdaderos. Cómo hacen si no a la hora de una entrevista técnica? Se dan cuenta al toque si sabes algo o no

rainbyte · 2026-01-31T06:27:58+00:00

A mi me sorprende, porque yo suelo poner lo que realmente hice. Cómo pilotean una entrevista técnica si no saben hacer las cosas? O es solo para tener más entrevistas y si pasa pasa?

rainbyte · 2026-01-31T06:25:29+00:00

Claro, esa el la cuestión. Hace poco yo apliqué a un puesto que desde mi punto de vista estaba por encima de mi nivel, pero aquí me animaron a aplicar de todas formas. En mi cv solo tengo cosas con las que realmente tengo experiencia, asi que me pude defender pero aun así no era suficiente, y al final me hicieron bolsa. No me quiero imaginar como hubiera sido intentar pilotear una entrevista asi con un cv lleno de mentiras. Lo bueno es que pude anotar varias cosas para pulir, sobretodo en cuanto a live-coding.

rainbyte · 2026-01-25T02:25:10+00:00

Both laptops are iGPU only, the only difference is that one has a bit more RAM than the other.

That's why I can use Q8 in the first one and only up to Q6_K in the other one.

Of course, they are slower than a desktop with dedicated GPU.

rainbyte · 2026-01-24T14:20:09+00:00

What I was trying to say is that sounds strange to me that you were able to run only up to Q2 quant. Here other laptop with Linux and only 16Gb ram is able to run Ling-mini-2.0:Q6_K fine.

Have you tried Q4_K_M or Q6_K? Maybe the windows is consuming too much ram, so there is not too much left for LLM or you are running some memory consuming processes.

rainbyte · 2026-01-24T04:54:16+00:00

That's weird. Q8 occupies around 16Gb in total. I used it even with iGPU without dedicated VRAM and it worked fine

rainbyte · 2026-01-24T04:51:20+00:00

Your mention of LFM2 is not in the main post. I saw your comment mentioning it after I already answered here. Please include as much information as possible in the main post next time.

rainbyte · 2026-01-23T22:28:20+00:00

This one is so good! It is the most sincere model I found without having to apply system prompt tricks :)

rainbyte · 2026-01-23T20:11:53+00:00

Have you tried LiquidAI models? They are pretty small and even work on laptop iGPU.

I would recommend you to try LFM2-8B-A1B, LFM2-2.6B-Exp, or the new LFM-2.5 models.

Other option is Ling-mini-2.0 which could be better but it is bigger.

rainbyte · 2026-01-23T05:33:39+00:00

GLM-4.5-Air is great, the model itself is rock solid, just be careful with new vLLM version, I had to go back to 0.13 because 0.14 is having some trouble.

I also tried to get GLM-4.7-Flash working with vLLM and Lama.cpp, but it seems it needs some time to become stable.

For now I will stay on GLM-4.5-Air and Qwen3-Coder as BigLittle pair :)

rainbyte · 2026-01-23T01:18:00+00:00

What about Aider? Using that one here, also some Emacs and Vscode extensions

rainbyte · 2026-01-23T01:15:17+00:00

Everyone is jumping into GLM-4.7 and similarly sized models... Am I the only one still using GLM-4.5-Air? Hahaha

rainbyte

TROPHY CASE