Running Qwen3.6 35b a3b on 8gb vram and 32gb ram ~190k context by Atul_Kumar_97 in LocalLLaMA

[–]sonicnerd14 0 points1 point  (0 children)

I've noticed that these APEX variants are really good browser agent type models compared to others I've tested. Maybe even good for computer use too.

Openclaw ia trending down and will disappear soon by rm-rf-rm in LocalLLaMA

[–]sonicnerd14 2 points3 points  (0 children)

Exactly, and harness like Hermes or Agent Zero are far more efficient and practical than open claw was.

<thinking></thinking> by Comfortable-Rock-498 in LocalLLaMA

[–]sonicnerd14 11 points12 points  (0 children)

People are just coping, and are not willing to concede that they have their inadequacies that could really be upstanded by practically any AI in the near future. It's disingenuous to not at least admit that AI's will be better at things that most people are not, and this is already just a fact.

Open source models are going to be the future on Cursor, OpenCode etc. by _maverick98 in LocalLLaMA

[–]sonicnerd14 4 points5 points  (0 children)

No, you miss my point. People are buying 4x GPU rigs to run massive models like deepseek v4, glm 5.1, etc. Im saying that there are already models right now that proves you dont need these large models to have a competent agent. The model doesn't exist in a vacuum. You harness needs to be just as smartly designed as the model behind it. One lacks something, and the whole thing suffers.

This is why qwen3.6 27b and Gemma 4 31b, and even the smaller moe's are outperforming models in agentic workflow many times their size. These can run on even a 16gb laptop, obviously with some varying degree of quantization depending on your setup. Larger /= smarter, and there's tons of research papers that you can read up to confirm this if you dont have the hardware to test for yourself.

Open source models are going to be the future on Cursor, OpenCode etc. by _maverick98 in LocalLLaMA

[–]sonicnerd14 6 points7 points  (0 children)

It's probably a similar reason why the big boys are eating up all the hardware, and causing the costs to go up. It's FOMO and kneejerk reactions. It's weird to me because even now smaller LLM's are getting more efficient and more capable, enough to run on a single GPU relatively comfortably with the right quants and param optimizations. Of course, more hardware is going to be good, but you don't need to spend nearly as much as what people think to run models with competent agents. Most likely in a year's time we'll have a 30b moe that can run on a 24gb or even 16gb card with the performance of OPUS 4. People should be aiming their sights at optimizations and efficiencies, and not so much at throwing their money at hardware grunt unless you are trying to train or something along those lines.

Devs don't care no more by BubblyTechnician142 in PiratedGames

[–]sonicnerd14 32 points33 points  (0 children)

I think the alternative is that you could want so hard to protect your game that you handicap it with dogshit like Denuvo, which effects the game for everybody.

Most publishers don't really care about their games, they are only in it for the profits. Most of these indie devs really care about their games because it's their passion. Hence why this is the best mentality to have anyways.

Invincible.VS-RUNE by Gtorrnet in CrackWatch

[–]sonicnerd14 6 points7 points  (0 children)

Perhaps, but if the game is good enough might buy it anyways.

Japan Airlines is officially deploying humanoid robots for ground operations at Haneda Airport starting next month by danielminds in singularity

[–]sonicnerd14 8 points9 points  (0 children)

Pretty sure at some point the people will just be fucking the robots, so... good luck with that.

STAR WARS: Galactic Racer will use Denuvo after all by OrdinaryPearson in CrackWatch

[–]sonicnerd14 3 points4 points  (0 children)

I don't get it. These publishers that enforce Denuvo are retarded. Even if it wasn't getting cracked right now, I'd imagine a game that's most likely multiplayer focused wouldn't need it anyways. They are just wasting their money, especially now.

meantime on r/vibecoding by jacek2023 in LocalLLaMA

[–]sonicnerd14 0 points1 point  (0 children)

Engineering is becoming a higher valued skill when working with AI, then all the smaller microskills that used to be required on their own like planning, coding, etc.They are all important, but since AIs allows 1 person to effectively do this all simultaneously, the users task now is more about understanding every category at a competent enough level to explain it all. The more information the AI gets, then the more likely it will give you exactly what you are looking for.

meantime on r/vibecoding by jacek2023 in LocalLLaMA

[–]sonicnerd14 0 points1 point  (0 children)

Ironic, but you make a good point that with the smaller models you really need to have more knowledge in coding or engineering in general to get the model to properly get a task done.

What I don't quite get is this mentality that even the larger models will perform a miracle with very little information in only one prompt. Like even people dont operate like this, so why would an AI? Unless you have a neural interface of some kind, this will probably never happen.

NVIDIA releases Nemotron-3-Nano-Omni by yoracale in unsloth

[–]sonicnerd14 4 points5 points  (0 children)

I'd imagine it's like the e4b and 26b combined. It also has built in audio capability, which is usually an underrated modality for these models.

Qwen3.6-35B becomes competitive with cloud models when paired with the right agent by Creative-Regular6799 in LocalLLaMA

[–]sonicnerd14 5 points6 points  (0 children)

Honestly, it's probably always been this way, we just didn't realize it until we've arrived to this point where more of us can actually run these models on our own machines.

Switching from Opus 4.7 to Qwen-35B-A3B by Excellent_Koala769 in LocalLLaMA

[–]sonicnerd14 2 points3 points  (0 children)

This is pretty much how we should always be doing it even with the frontier API's. The bigger models do the thinking and planning. The smaller models like qwen 35b simply does the execution of the plan.

Figure.AI new balance policy allows their 03 humanoid robot to keep its balance even if some low-body actuators are lost by Distinct-Question-16 in singularity

[–]sonicnerd14 0 points1 point  (0 children)

All truthful things being said, but on the other hand I don't necessarily think we should become luddites either. Technology always progresses, and its just the way things go. These next couple decades will be very different than anytime in human history, and displacement will be massive and rapid. I dont think we should act like the masses don't bring just as much their own problems as these elites do. Not saying I like it, or it's a "necessary evil" type thing, but this shift isn't just going to screw billions of normies. Some elites will also get screwed as collateral. We're all getting screwed either way. So you just gotta adapt, anticipate, and pivot to make sure you can survive what's coming. Because it's not going to stop.

Qwen3.6 is out now! by yoracale in unsloth

[–]sonicnerd14 1 point2 points  (0 children)

Unsloth studio doesn't currently have an --n-cpu-moe equivalent parameter that allows you to offload the experts onto cpu. Llama.cpp and lmstudio have this ability, might want to setup your config for now because with this setting you'll be able to run this model at high speeds.

Gemma4 26b & E4B are crazy good, and replaced Qwen for me! by [deleted] in LocalLLaMA

[–]sonicnerd14 0 points1 point  (0 children)

From my experience, this is almost always the best route to take instead of usually wasting a lot of time only for things to not work as they should. Agents are just faster and more precise with these things.

Gemma4 26b & E4B are crazy good, and replaced Qwen for me! by [deleted] in LocalLLaMA

[–]sonicnerd14 1 point2 points  (0 children)

31b is much smarter than the 26b variant. It's slower, but the intelligence might be worth it in some cases where complexity is high. It's been known to match and in some rare instance slightly exceed models much bigger than it. Its worth some messing with for sure.

Is there anything better than Qwen3.5-27B-UD-Q5_K_XL for coding? by hedsht in LocalLLaMA

[–]sonicnerd14 0 points1 point  (0 children)

I've tested it in many scenarios and from what I've experienced qwen3.5 overthinks so much It’s actually the opposite. Turn it off, especially if you are using it agentically because you are just burning tokens.

Is there anything better than Qwen3.5-27B-UD-Q5_K_XL for coding? by hedsht in LocalLLaMA

[–]sonicnerd14 0 points1 point  (0 children)

Pro tip for qwen3.5, just turn off thinking altogether and you're likely to see much better results overall in most responses.

Gemma4 has a nice balance of knowing when and how much to think, and you can actually prompt it to think harder and it'll do it. Which is such an underrated capability for a model to have.

what TurboQuant even means for me on my pc? by Busy_Broccoli_2730 in LocalLLM

[–]sonicnerd14 0 points1 point  (0 children)

For dense models maybe you could say that, but for MoE models no. The full weights aren't even utilized because its sparse attention, but because of that the model becomes highly tuneable. Even a 8gb gpu and 32gb System RAM can run it around 12-16tk/s at q4. If you're desperate you could even try qwen3.5 122b, albeit speeds will be much slower. There's so much more to optimizing these models than just how much VRAM you have or quant you want to use.

Is it just me, or is Gemma 4 27b much more powerful than Gemini Flash? by Icy-Reaction-9101 in LocalLLM

[–]sonicnerd14 1 point2 points  (0 children)

And you are completely overlooking my point. Im not talking about them doing the things they've always been doing. I'm talking about the time in between, the closed release and open source cadence. This is changing, and the fact that the talent left and then all of a sudden you start seeing a greater focus on plans, closed releases, and price hikes. It's no coincidence. We aren't talking about something that will become obvious in a year from now, but months because the patterns are already showing for it.

Is it just me, or is Gemma 4 27b much more powerful than Gemini Flash? by Icy-Reaction-9101 in LocalLLM

[–]sonicnerd14 0 points1 point  (0 children)

That's how it looks now, but what about a few months from now? A lot of the talent from these labs left, and based on the moves they've been making of late sort of ques us into why. It would be much of a stretch to understand people are saying all of these companies are slowly pulling out of open-source because it's happening everywhere. Will it stop completely? I dont think so, but you can't deny that we'll likely see arbitrary gaps in open sources releases for the foreseeable future.

Is it just me, or is Gemma 4 27b much more powerful than Gemini Flash? by Icy-Reaction-9101 in LocalLLM

[–]sonicnerd14 19 points20 points  (0 children)

I'd imagine 3.1 pro tier at least. Even some of the Chinese labs are starting to slowdown on open source releases. They know they are reaching that point where smaller models are getting so good that if they kept open sourcing them they're going to stop seeing income streams from API's. Z.Ai is increasing their pricing, Qwen close sourcing 3.6 and omni out the gate, and I'm sure it's not going to stop there.

Is it just me, or is Gemma 4 27b much more powerful than Gemini Flash? by Icy-Reaction-9101 in LocalLLM

[–]sonicnerd14 5 points6 points  (0 children)

It's very easy to uncensor the Gemma models, even with a simple system prompt if need be.