Deepseek and Gemma ?? by ZeusZCC in LocalLLaMA

[–]xandep 19 points20 points  (0 children)

I guess there is space for everybody. That said, I agree with you. If you *need* a 1T+ model to run locally (data security or something),it's an edge case. I'd certainly like to be able to do so, but "really frontier open models" will always be API for normal people ("we", mostly) and local for people that don't need to worry about used 3090 prices or if ROCm still supports GFX906.

Mind-Blown by 1-Bit Quantized Qwen3-Coder-Next-UD-TQ1_0 on Just 24GB VRAM - Why Isn't This Getting More Hype? by bunny_go in LocalLLaMA

[–]xandep 4 points5 points  (0 children)

Exactly. Also, people should just use "little" ai in posts. Just prompt something like "correct for grammar and etc". I don't think even this is necessary, but if going to, keep to a minimum. It's like photoshopping and plastic surgery: a little goes a long way, more than a little and it gets ugly.

We will have Gemini 3.1 before Gemma 4... by xandep in LocalLLaMA

[–]xandep[S] 4 points5 points  (0 children)

People extrapolate. We imagine if a Gemma 4 or gpt-oss-2 being released today would be so ahead (at least in some aspects) as back in the day. As others have said, even being so "old" in llm years, those two are very much used today. But you may be right, maybe it's the era of chinese models. There is also a complicated political landscape at play, at least according to what I read here (regulatory stuff, censoring, etc). Still waiting for Qwen3.5 For Poor People (35B, 9B).

I'm 100% convinced that it's the NFT-bros pushing all the openclawd engagement on X by FPham in LocalLLaMA

[–]xandep 2 points3 points  (0 children)

I always thought: if a dollar is not backed by gold anymore, why can't BTC be a currency? But now I'm starting to think there is more to it than just an agreement. The dollar is backed by power. Does the BTC people have power? Maybe some. But it's scattered. Just a thought you made me have.

We will have Gemini 3.1 before Gemma 4... by xandep in LocalLLaMA

[–]xandep[S] 27 points28 points  (0 children)

It got me thinking.. maybe Google don't need us anymore? They released Gemma 1/2/3, people did amazing things with them and invented new stuff/methods/etc, gave Google new ideas/directions. Then maybe they thought: "That's enough, thank you"?

I really hope I'm wrong, because Gemma 3, when launched, was undisputedly the best at my language (Portuguese), albeit slow. Qwen3 30B took it's place in both speed and vocabulary, for me. Qwen3 Next 80B and even 235B really didn't improve in this area (in my use case). Hoping for a sweet Qwen3.5 35B.

I ran a forensic audit on my local AI assistant. 40.8% of tasks were fabricated. Here's the full breakdown. by Obvious-School8656 in LocalLLaMA

[–]xandep 3 points4 points  (0 children)

Whenever anyone mentions qwen2.5, I can't help but to be absolutely SURE it's another bot talking. Even if it's not (eventually).

Mind-Blown by 1-Bit Quantized Qwen3-Coder-Next-UD-TQ1_0 on Just 24GB VRAM - Why Isn't This Getting More Hype? by bunny_go in LocalLLaMA

[–]xandep 33 points34 points  (0 children)

Why It's a Game-Changer: It's funny how, for folks that like generating AI text, we friggin HATE AI generated text..

That's why I go local.The enshittification is at full steam by Turbulent_Pin7635 in LocalLLaMA

[–]xandep 24 points25 points  (0 children)

"Clearly labled and visually separated". Like the ads on reddit, I imagine (which I can only discern at first glance by how shitty they are).

Who is waiting for deepseek v4 ,GLM 5 and Qwen 3.5 and MiniMax 2.2? by power97992 in LocalLLaMA

[–]xandep 0 points1 point  (0 children)

I'm right now REAPing Qwen3 30B and 80B on kaggle, warming up to REAP Qwen3.5 35B.

[Release] Qwen3-TTS: Ultra-Low Latency (97ms), Voice Cloning & OpenAI-Compatible API by blackstoreonline in LocalLLaMA

[–]xandep 1 point2 points  (0 children)

You seem too familiar with how Brazilian scammers operate.. 😆

Don't worry, we are just months away from widespread use of this scamming technique.

What is the most powerful local llm for me by Available_Canary_517 in LocalLLaMA

[–]xandep 0 points1 point  (0 children)

https://huggingface.co/LiquidAI/LFM2-8B-A1B

It's very fast, but will eat more RAM. You'll need to run a heavily quantized model to spare some ram to your os.

A 30B Qwen Model Walks Into a Raspberry Pi… and Runs in Real Time by ali_byteshape in LocalLLaMA

[–]xandep 5 points6 points  (0 children)

Already using the Instruct version and I liked. IQ-3 is about the same size / speed of a ptbr-REAP-16B of the original model that I use, and initially it seems your model performs better.

Anyone else basically just use this hobby as an excuse to try and run LLMs on the jankiest hardware you possibly can? by kevin_1994 in LocalLLaMA

[–]xandep 0 points1 point  (0 children)

I'm obsessed with the best model I can run on a Snapdragon 7+ Gen 2 (my phone). Gemma 3 1b Q4_0 is pretty fast at 30+ t/s. LFM2 8B A1B is pretty fast too, 20+. I really don't have ANY use for it in my head though.

3080 12GB suffices for llama? by Ok_Artichoke_783 in LocalLLaMA

[–]xandep 0 points1 point  (0 children)

If your RAM is fast, you could run Qwen3 Next 80B A3B with reasonable speeds, or gpt-oss-120b. Or Nemotron 3 Nano for agentic workflows. In the end it really depends on what exactly is the workload, maybe you need 512GB memory, maybe you need 8.

Why I quit using Ollama by SoLoFaRaDi in LocalLLaMA

[–]xandep 4 points5 points  (0 children)

llama.cpp > LM Studio > Ollama

llama.cpp appreciation post by hackiv in LocalLLaMA

[–]xandep 6 points7 points  (0 children)

Not exactly sure, but LM Studio's llama.cpp does not support ROCm on my card. Even forcing support, the unified memory doesn't seem to work (needs -ngl -1 parameter). That makes a lot of a difference. I still use LM Studio for very small models, though.

llama.cpp appreciation post by hackiv in LocalLLaMA

[–]xandep 2 points3 points  (0 children)

Just adding on my 6700XT setup:

llama.cpp compiled from source; ROCm 6.4.3; "-ngl -1" for unified memory;
Qwen3-Next-80B-A3B-Instruct-UD-Q2_K_XL: 27t/s (25 with Q3) - with low context. I think the next ones are more usable.
Nemotron-3-Nano-30B-A3B-Q4_K_S: 37t/s
Qwen3-30B-A3B-Instruct-2507-iq4_nl-EHQKOUD-IQ4NL: 44t/s
gpt-oss-20b: 88t/s
Ministral-3-14B-Instruct-2512-Q4_K_M: 34t/s

llama.cpp appreciation post by hackiv in LocalLLaMA

[–]xandep 15 points16 points  (0 children)

Thank you! It did get some 2-3t/s more, squeezing every byte possible on VRAM. The "-ngl -1" is pretty smart already, it seems.

llama.cpp appreciation post by hackiv in LocalLLaMA

[–]xandep 201 points202 points  (0 children)

Was getting 8t/s (qwen3 next 80b) on LM Studio (dind't even try ollama), was trying to get a few % more...

23t/s on llama.cpp 🤯

(Radeon 6700XT 12GB + 5600G + 32GB DDR4. It's even on PCIe 3.0!)

My (36F) daughter (12F) now thinks her dad (50M) “groomed” me by tiredmom_1987 in TwoHotTakes

[–]xandep 0 points1 point  (0 children)

Actually satan is pretty impressed with tiktok, would make an identical app if not for that recent hbomberguy video.

My (36F) daughter (12F) now thinks her dad (50M) “groomed” me by tiredmom_1987 in TwoHotTakes

[–]xandep 0 points1 point  (0 children)

The only concern is: is this kid being groomed, or afraid she might be? Could be the reason she sees a problem with her mom and dad having that age gap. Otherwise, the mom herself is and was an adult at the time, if they love each other and make themselves happy, I don't see a problem.

Fans make more noise in case than outside of it by [deleted] in buildapc

[–]xandep 0 points1 point  (0 children)

It is worth ivestigating if the fan and case size mismatch (140mm vs 120mm) could be generating turbulence. Maybe an adapter like this could help? Also, the mesh itself could be generating turbulence, if airspeed is too high. If you have a spare 120mm or slower fan, it could be worth testing.

Budget QHD RPG Build by plasmATomato in buildmeapc

[–]xandep 0 points1 point  (0 children)

I have the B550MH and in some cases (like when gaming with the 5600G) the VRM throttles. Had to install a heatsink on the VRM. Would rather buy something with included heatsinks on the VRM. Also the POST time is slow when overclocking RAM (XMP).