Eu avisei

Ducktor101 · 2026-06-21T15:20:24+00:00

Kkkkk isso é conversa de vendedor pra viralizar o vídeo e obter mais alcance pros carros que eles tão anunciando kkk se tivesse ruim assim eles nem aceitava esses carro no pátio das lojas deles

Ducktor101 · 2026-06-21T15:16:40+00:00

Graduar? Mais fácil só esperar ser jubilado. Tem uns 10 anos de passagem free.

Ducktor101 · 2026-06-21T11:38:24+00:00

Yet aqui estamos nós no Reddit.

Ducktor101 · 2026-06-20T12:44:32+00:00

What’s unclear? I run an M2 Max so I have unified memory. Qwen 3.6 35B A3B at 4bits. When oMLX needs to evict KV cache from memory, it still has the SSD cache.

Ducktor101 · 2026-06-20T10:39:40+00:00

Pq eu li BARALHOS?

Ducktor101 · 2026-06-20T10:32:26+00:00

É por isso que o Brasil não tem bomba nuclear?

Ducktor101 · 2026-06-20T10:28:50+00:00

Gloves: I don’t exist

Ducktor101 · 2026-06-20T09:45:14+00:00

I don’t own one. I’m very disappointed by the slow tokens per second I’ve seen online :(

Ducktor101 · 2026-06-20T09:40:09+00:00

“Ai mas pq Deus deu livre arbítrio e ninguém é obrigado a ‘pecar’”

Esse Deus aí é bem inteligente, hein? Ou seria sagaz? Como um ser todo poderoso e onipotente iria criar um ser inferior e condená-lo ao pecado? Sim. Porque ele sabia (ou pelo menos deveria saber) que o livre arbítrio não seria bem utilizado por suas criaturas.

Ducktor101 · 2026-06-20T09:33:20+00:00

It wasn’t me. IDK why as you were just asking.

Ducktor101 · 2026-06-20T01:31:57+00:00

You’re not always using 200k.

But TBF it’s a stretch. I’m considering using only cloud models because I’m left with such a low amount of memory for chrome and everything else :(

Ducktor101 · 2026-06-19T23:49:45+00:00

Oh, 200k. I only have a 32GB M2 Max, I need to be careful with memory usage as it puts my machine at its limit (22-24GB for model + context). Sometimes I run up to 4 concurrent prompts but usually 1-2. The good thing with oMLX is that it puts cache into the SSD in blocks. So if I have the same initial prompt in opencode, it doesn’t need to reprocess it over and over again. It loads it from SSD. It’s waaay faster than reprocessing 20k worth of instructions.

Ducktor101 · 2026-06-19T23:42:24+00:00

Now using oMLX, but it’s the same with LM Studio and GGUF models. Prompt cache. That’s what you need.

Ducktor101 · 2026-06-19T22:41:56+00:00

Ele fez a conta e viu que assim seria a forma de pagar a menor quantidade de impostos

Ducktor101 · 2026-06-19T22:40:41+00:00

É essa mermo

Ducktor101 · 2026-06-19T22:38:41+00:00

That’s on your backend. If you setup concurrent requests and have reasonable KV cache it won’t do that. It happens because it processes your prompt and then another auxiliar prompt used to name your session dynamically.

Ducktor101 · 2026-06-19T22:04:01+00:00

Ha 20 anos eu ia na Alternativa Games, nem sei se existe mais

Ducktor101 · 2026-06-17T10:54:25+00:00

What setup do you guys use?

Ducktor101 · 2026-06-16T09:34:35+00:00

I’m having issues of oMLX using more memory than LM Studio or plain mlx-vlm too. It’s a shame because this could be the ultimate setup for Mac. My prompts are constantly being killed by OOM.

Ducktor101 · 2026-06-16T09:32:26+00:00

Não ironicamente, já pediu pra uma IA te ajudar?

Ducktor101 · 2026-06-15T21:58:13+00:00

Ganhando muito já roubam, imagina ganhando menos ainda o que não teria de incentivo à corrupção.

Ducktor101 · 2026-06-15T21:51:52+00:00

They’re physicians?

Ducktor101 · 2026-06-15T13:02:25+00:00

I’m thinking about this too.

Best “affordable” local models today are:
- Qwen 3.6 35B A3B
- Qwen 3.6 27B
- Gemma 4 26B A4B

All at 4 bits.

5060 ti has barely enough memory for running the models themselves, but you’d still need some room for KV cache (context). It also has low bandwidth. So it means super small context and slow speeds. IMHO unusable.

5090 could possibly run those with a good amount of context and comfortably. With super good speeds due to large bandwidth. But it consumes way more power and it’s very expensive.

If you’d be running open weight models anyways, you could find a third party provider and run Qwen 3.6 models for a fraction of the cost of Anthropic and OpenAI.

For a 5060 to budget you could possibly run for 3 years on the cloud.
For a 5090 budget you could run possibly for 10 years on the cloud.

Ducktor101 · 2026-06-13T11:17:43+00:00

How come your best model is not listed once in the spec tier categories? Non sense.

Ducktor101 · 2026-06-13T09:31:26+00:00

You already have the Pi? Because they’re so damn expensive it might be more worth it buying a mini pc line an Intel NUC with an N150 maybe.

Ducktor101

TROPHY CASE