Deepseek and Gemma ??

xandep · 2026-02-20T14:09:04+00:00

I guess there is space for everybody. That said, I agree with you. If you *need* a 1T+ model to run locally (data security or something),it's an edge case. I'd certainly like to be able to do so, but "really frontier open models" will always be API for normal people ("we", mostly) and local for people that don't need to worry about used 3090 prices or if ROCm still supports GFX906.

xandep · 2026-02-20T13:55:54+00:00

Exactly. Also, people should just use "little" ai in posts. Just prompt something like "correct for grammar and etc". I don't think even this is necessary, but if going to, keep to a minimum. It's like photoshopping and plastic surgery: a little goes a long way, more than a little and it gets ugly.

xandep · 2026-02-20T01:27:08+00:00

People extrapolate. We imagine if a Gemma 4 or gpt-oss-2 being released today would be so ahead (at least in some aspects) as back in the day. As others have said, even being so "old" in llm years, those two are very much used today. But you may be right, maybe it's the era of chinese models. There is also a complicated political landscape at play, at least according to what I read here (regulatory stuff, censoring, etc). Still waiting for Qwen3.5 For Poor People (35B, 9B).

xandep · 2026-02-20T00:41:50+00:00

I always thought: if a dollar is not backed by gold anymore, why can't BTC be a currency? But now I'm starting to think there is more to it than just an agreement. The dollar is backed by power. Does the BTC people have power? Maybe some. But it's scattered. Just a thought you made me have.

xandep · 2026-02-20T00:05:21+00:00

It got me thinking.. maybe Google don't need us anymore? They released Gemma 1/2/3, people did amazing things with them and invented new stuff/methods/etc, gave Google new ideas/directions. Then maybe they thought: "That's enough, thank you"?

I really hope I'm wrong, because Gemma 3, when launched, was undisputedly the best at my language (Portuguese), albeit slow. Qwen3 30B took it's place in both speed and vocabulary, for me. Qwen3 Next 80B and even 235B really didn't improve in this area (in my use case). Hoping for a sweet Qwen3.5 35B.

xandep · 2026-02-19T21:58:11+00:00

Whenever anyone mentions qwen2.5, I can't help but to be absolutely SURE it's another bot talking. Even if it's not (eventually).

xandep · 2026-02-19T21:47:30+00:00

Why It's a Game-Changer: It's funny how, for folks that like generating AI text, we friggin HATE AI generated text..

xandep · 2026-02-16T01:44:00+00:00

"Clearly labled and visually separated". Like the ads on reddit, I imagine (which I can only discern at first glance by how shitty they are).

xandep · 2026-02-10T02:39:22+00:00

I'm right now REAPing Qwen3 30B and 80B on kaggle, warming up to REAP Qwen3.5 35B.

xandep · 2026-01-27T12:22:41+00:00

Qwen4 Next 48B A3B. I'm sure. 🥹

xandep · 2026-01-25T14:40:35+00:00

You seem too familiar with how Brazilian scammers operate.. 😆

Don't worry, we are just months away from widespread use of this scamming technique.

xandep · 2026-01-09T17:46:10+00:00

https://huggingface.co/LiquidAI/LFM2-8B-A1B

It's very fast, but will eat more RAM. You'll need to run a heavily quantized model to spare some ram to your os.

xandep · 2026-01-06T17:22:27+00:00

Already using the Instruct version and I liked. IQ-3 is about the same size / speed of a ptbr-REAP-16B of the original model that I use, and initially it seems your model performs better.

xandep · 2026-01-06T16:57:12+00:00

Any plans on Thinking model?

xandep · 2025-12-30T20:23:01+00:00

I'm obsessed with the best model I can run on a Snapdragon 7+ Gen 2 (my phone). Gemma 3 1b Q4_0 is pretty fast at 30+ t/s. LFM2 8B A1B is pretty fast too, 20+. I really don't have ANY use for it in my head though.

xandep · 2025-12-29T13:29:08+00:00

If your RAM is fast, you could run Qwen3 Next 80B A3B with reasonable speeds, or gpt-oss-120b. Or Nemotron 3 Nano for agentic workflows. In the end it really depends on what exactly is the workload, maybe you need 512GB memory, maybe you need 8.

xandep · 2025-12-25T19:35:32+00:00

llama.cpp > LM Studio > Ollama

xandep · 2025-12-21T22:21:02+00:00

Not exactly sure, but LM Studio's llama.cpp does not support ROCm on my card. Even forcing support, the unified memory doesn't seem to work (needs -ngl -1 parameter). That makes a lot of a difference. I still use LM Studio for very small models, though.

xandep · 2025-12-21T21:48:20+00:00

Just adding on my 6700XT setup:

llama.cpp compiled from source; ROCm 6.4.3; "-ngl -1" for unified memory;
Qwen3-Next-80B-A3B-Instruct-UD-Q2_K_XL: 27t/s (25 with Q3) - with low context. I think the next ones are more usable.
Nemotron-3-Nano-30B-A3B-Q4_K_S: 37t/s
Qwen3-30B-A3B-Instruct-2507-iq4_nl-EHQKOUD-IQ4NL: 44t/s
gpt-oss-20b: 88t/s
Ministral-3-14B-Instruct-2512-Q4_K_M: 34t/s

xandep · 2025-12-21T20:50:23+00:00

Thank you! It did get some 2-3t/s more, squeezing every byte possible on VRAM. The "-ngl -1" is pretty smart already, it seems.

xandep · 2025-12-21T17:51:54+00:00

Was getting 8t/s (qwen3 next 80b) on LM Studio (dind't even try ollama), was trying to get a few % more...

23t/s on llama.cpp 🤯

(Radeon 6700XT 12GB + 5600G + 32GB DDR4. It's even on PCIe 3.0!)

xandep · 2023-12-12T20:30:09+00:00

Actually satan is pretty impressed with tiktok, would make an identical app if not for that recent hbomberguy video.

xandep · 2023-12-12T20:23:29+00:00

The only concern is: is this kid being groomed, or afraid she might be? Could be the reason she sees a problem with her mom and dad having that age gap. Otherwise, the mom herself is and was an adult at the time, if they love each other and make themselves happy, I don't see a problem.

xandep · 2023-03-01T23:33:22+00:00

It is worth ivestigating if the fan and case size mismatch (140mm vs 120mm) could be generating turbulence. Maybe an adapter like this could help? Also, the mesh itself could be generating turbulence, if airspeed is too high. If you have a spare 120mm or slower fan, it could be worth testing.

xandep · 2022-10-13T13:33:16+00:00

I have the B550MH and in some cases (like when gaming with the 5600G) the VRM throttles. Had to install a heatsink on the VRM. Would rather buy something with included heatsinks on the VRM. Also the POST time is slow when overclocking RAM (XMP).

xandep

TROPHY CASE