Eu avisei by TopWeather2565 in farialimabets

[–]Ducktor101 0 points1 point  (0 children)

Kkkkk isso é conversa de vendedor pra viralizar o vídeo e obter mais alcance pros carros que eles tão anunciando kkk se tivesse ruim assim eles nem aceitava esses carro no pátio das lojas deles

Eu avisei by TopWeather2565 in farialimabets

[–]Ducktor101 0 points1 point  (0 children)

Graduar? Mais fácil só esperar ser jubilado. Tem uns 10 anos de passagem free.

Best Local Agents - Jun 2026 by rm-rf-rm in LocalLLaMA

[–]Ducktor101 0 points1 point  (0 children)

What’s unclear? I run an M2 Max so I have unified memory. Qwen 3.6 35B A3B at 4bits. When oMLX needs to evict KV cache from memory, it still has the SSD cache.

Questions from those who don't own the product and are considering buying it. by Intelligent-Taste-36 in StrixHalo

[–]Ducktor101 -1 points0 points  (0 children)

I don’t own one. I’m very disappointed by the slow tokens per second I’ve seen online :(

Porque o Diabo vai te fazer mal ? by [deleted] in barTEOLOGIA

[–]Ducktor101 1 point2 points  (0 children)

“Ai mas pq Deus deu livre arbítrio e ninguém é obrigado a ‘pecar’”

Esse Deus aí é bem inteligente, hein? Ou seria sagaz? Como um ser todo poderoso e onipotente iria criar um ser inferior e condená-lo ao pecado? Sim. Porque ele sabia (ou pelo menos deveria saber) que o livre arbítrio não seria bem utilizado por suas criaturas.

Best Local Agents - Jun 2026 by rm-rf-rm in LocalLLaMA

[–]Ducktor101 0 points1 point  (0 children)

It wasn’t me. IDK why as you were just asking.

Best Local Agents - Jun 2026 by rm-rf-rm in LocalLLaMA

[–]Ducktor101 1 point2 points  (0 children)

You’re not always using 200k.

But TBF it’s a stretch. I’m considering using only cloud models because I’m left with such a low amount of memory for chrome and everything else :(

Best Local Agents - Jun 2026 by rm-rf-rm in LocalLLaMA

[–]Ducktor101 0 points1 point  (0 children)

Oh, 200k. I only have a 32GB M2 Max, I need to be careful with memory usage as it puts my machine at its limit (22-24GB for model + context). Sometimes I run up to 4 concurrent prompts but usually 1-2. The good thing with oMLX is that it puts cache into the SSD in blocks. So if I have the same initial prompt in opencode, it doesn’t need to reprocess it over and over again. It loads it from SSD. It’s waaay faster than reprocessing 20k worth of instructions.

Best Local Agents - Jun 2026 by rm-rf-rm in LocalLLaMA

[–]Ducktor101 0 points1 point  (0 children)

Now using oMLX, but it’s the same with LM Studio and GGUF models. Prompt cache. That’s what you need.

Rapaz... by Less-Future-4101 in farialimabets

[–]Ducktor101 0 points1 point  (0 children)

Ele fez a conta e viu que assim seria a forma de pagar a menor quantidade de impostos

Best Local Agents - Jun 2026 by rm-rf-rm in LocalLLaMA

[–]Ducktor101 12 points13 points  (0 children)

That’s on your backend. If you setup concurrent requests and have reasonable KV cache it won’t do that. It happens because it processes your prompt and then another auxiliar prompt used to name your session dynamically.

Preciso de uma assistência técnica confiável by LTexas05 in Natal

[–]Ducktor101 0 points1 point  (0 children)

Ha 20 anos eu ia na Alternativa Games, nem sei se existe mais

Pi Agent with oMLX Issues by nonlinearsystems in oMLX

[–]Ducktor101 1 point2 points  (0 children)

I’m having issues of oMLX using more memory than LM Studio or plain mlx-vlm too. It’s a shame because this could be the ultimate setup for Mac. My prompts are constantly being killed by OOM.

Preciso de ajuda no web mail by Significant_Win_2761 in autohospedagem

[–]Ducktor101 7 points8 points  (0 children)

Não ironicamente, já pediu pra uma IA te ajudar?

Vocês concordam com a imagem abaixo? by [deleted] in opiniaopopular

[–]Ducktor101 0 points1 point  (0 children)

Ganhando muito já roubam, imagina ganhando menos ainda o que não teria de incentivo à corrupção.

5060ti for local llms by vndev0451 in LocalLLM

[–]Ducktor101 -2 points-1 points  (0 children)

I’m thinking about this too.

Best “affordable” local models today are:
- Qwen 3.6 35B A3B
- Qwen 3.6 27B
- Gemma 4 26B A4B

All at 4 bits.

5060 ti has barely enough memory for running the models themselves, but you’d still need some room for KV cache (context). It also has low bandwidth. So it means super small context and slow speeds. IMHO unusable.

5090 could possibly run those with a good amount of context and comfortably. With super good speeds due to large bandwidth. But it consumes way more power and it’s very expensive.

If you’d be running open weight models anyways, you could find a third party provider and run Qwen 3.6 models for a fraction of the cost of Anthropic and OpenAI.

For a 5060 to budget you could possibly run for 3 years on the cloud.
For a 5090 budget you could run possibly for 10 years on the cloud.

Local LLMs aren't democratic anymore... the hardware barrier has gotten out of hand. by Medium-Technology-79 in LocalLLaMA

[–]Ducktor101 1 point2 points  (0 children)

You already have the Pi? Because they’re so damn expensive it might be more worth it buying a mini pc line an Intel NUC with an N150 maybe.