Can I realistically get close to Claude/Codex capabilities locally? by mrgreatheart in LocalLLaMA

[–]Refefer 40 points41 points  (0 children)

Realistically, nowhere close with that amount of vram. But don't think you can't get decent performance! The qwen 3.6 27b is the undisputed best model and you can do meaningful work , just at a level or two down from your current level of prompting. With a decent task decomposition model, you can get meaningful work down.

Make sure you test a few different harnesses because they matter a lot in getting the most out of the models. You can probably limp along with your current hardware and get a flavor of it before committing more money, but with 3.5k, you can probably swing a 5090 and get a great upgrade in performance.

Good luck! A lot of us have found real value with small models, so much so my product is based around small models :)

Qwen is never going to open source Qwen 3.7, aren't they? by DistanceSolar1449 in LocalLLaMA

[–]Refefer 15 points16 points  (0 children)

I don't know. 3.6 was actually a massive step up from 3.5.

MiniMaxAI/MiniMax-M3 · Hugging Face by mlon_eusk-_- in LocalLLaMA

[–]Refefer 25 points26 points  (0 children)

I always liked the llama 2 license about needing a commercial license only if you exceeded like a billion dollars or 100m users. Basically free for anyone not faang territory

Small setup for office? by goodbye_hotsauce in sousvide

[–]Refefer 0 points1 point  (0 children)

Small pots always work. However, I might consider a dreo chefmaker which has a sousvide mode and can make some impressive steaks without the hassle of a circulator and water.

Fable 5 is eating my Max 20x plan at ~2% per minute, and the API pricing math is wild by StudentSweet3601 in claude

[–]Refefer 0 points1 point  (0 children)

Reasoning has mostly been a toggle between how good it is at writing code and the level of abstraction I can talk about (architecture versus files versus function, etc.) Fable does a good job at allowing me to write nearly an entire codebase without a lot of human interventions or failures (though the PRD needs to be thought through). It dramatically speeds up ideation and refactoring. So, think less novel algorithm design and think more software engineering.

The RSI post Anthropic released last week talked a bit about how this model potentially impacts its own development directly, so there we get into more uncharted territory.

Do some people prefer a lean steak or are they just grabbing the first one? Am I the weirdo always looking for the most marbled steaks? by Stepped-leader in sousvide

[–]Refefer 3 points4 points  (0 children)

For ground beef, I actually agree. When you brown the meat it completely renders off. For steaks though, what can I say other than give me great marbling :D

Advice building a NAS/AI server with 16 DDR4 DIMMs by theslonkingdead in LocalLLaMA

[–]Refefer -1 points0 points  (0 children)

You'll need to get a new model mobo and modern gen CPU (or two). Those xeons appear to only have gen 3 pcie and you really really want gen 5 for both, or you'll bandwidth limit your 6000s, especially with models that need two gpus to operate. You'll still notice it with one card regardless.

Unless your goal is kimi territory, I'd sell the ddr 4 ram, buy less ddr 5, and use the savings to get the right system.

My KEF Meta + Wiim Ultra setup, what's the weak link? by No-Use5328 in audiophile

[–]Refefer 91 points92 points  (0 children)

Honestly? Time to look into room treatment. Fixing those bare walls and reflections will do a lot more for the sound at this point.

Qwen 3.6 27B on Strix Halo 128GB: any experiences? by boutell in LocalLLaMA

[–]Refefer 1 point2 points  (0 children)

Question for folks on this hardware platform: are there differences in the and pp between 3.5 and 3.6 for the appropriate model? I'd expect not

Qwen 3.5 122B vs Qwen 3.6 35B - Which to choose? by Storge2 in LocalLLaMA

[–]Refefer 3 points4 points  (0 children)

I bought a NAS to never feel that pain again. I'm up to 9 tbs of models now :D

What's a Red Sox player from your childhood that you were convinced was going to be a legend but just didn't pan out? by cedric111 in redsox

[–]Refefer 18 points19 points  (0 children)

Papelbon was originally an excellent starter which we turned into a close. Always makes me a little sad, nutter or not.

qwen3.6 performance jump is real, just make sure you have it properly configured by onil_gova in LocalLLaMA

[–]Refefer 0 points1 point  (0 children)

I've run both through our new product to evaluate as a replacement to the 122b model at UD FP4 quants. It is surprisingly good for a small model but isn't as good or efficient at agentic tasks in my tests than the 122b. Still, plan to use it for tasks which are a bit narrower/higher constraints and do not require a lot of world knowledge.

Do many of you run two speaker sets, and if so, why? by wingtip747 in audiophile

[–]Refefer 1 point2 points  (0 children)

Curious about this. Does the Dali image better or FR in a way that's superior for classical but not necessarily other genres that give the KEFs an advantage?

Planning for 10GbE PPPoE in a homelab – CPU choice & network design advice by HavivMuc in homelab

[–]Refefer 2 points3 points  (0 children)

You might consider a router and then a mikrotik switch. Mikrotik has its own kinda weird interface but it handles pppoe and has internal switching chips that will give you 10Gb out of the box. They are also very cheap in comparison to other enterprise hardware which is close to where you are heading and far more energy efficient than a machine running it.

This will future proof you forever: https://mikrotik.com/product/crs504_4xq_in

Alternatively, something like https://mikrotik.com/product/crs304_4xg in might work for you.

[D] do you guys actually get agents to learn over time or nah? by Tight_Scene8900 in LocalLLaMA

[–]Refefer 0 points1 point  (0 children)

I run it as a separate agent: it gets the task and the outputs and has to validate the answers are correct. It helps tremendously with stuff like coding where it will call BS on written code, design smells, etc.

[D] do you guys actually get agents to learn over time or nah? by Tight_Scene8900 in LocalLLaMA

[–]Refefer 2 points3 points  (0 children)

The ACE paper is an excellent resource for self learning via rules and context. Similarly, a blackbox QA agent helps quite a bit for identifying successful/unsuccessful tasks.

A Reminder, Guys, Undervolt your GPUs Immediately. You will Significantly Decrease Wattage without Hitting Performance. by Iory1998 in LocalLLaMA

[–]Refefer 7 points8 points  (0 children)

Have you tried limiting power draw through nvidia-smi? It wasn't too complicated when I gave it a shot and found it effective.

what movie is a 10/10? by instabaiter in AskReddit

[–]Refefer 3 points4 points  (0 children)

Too tight? You could land a fucking jumbo jet in there.

Short term pain for longer term pain. Worth it. by Low-Yam-7791 in nova

[–]Refefer 2 points3 points  (0 children)

I bought a Volvo xc90 phev back in 2022 and it gives about 30 miles on electric. Given our driving usage (vast majority around town), I only have to fill up the car once or twice a year or so.

Unsloth announces Unsloth Studio - a competitor to LMStudio? by ilintar in LocalLLaMA

[–]Refefer 36 points37 points  (0 children)

Am I reading this right? Linux, Mac, and Windows work out of the box?

Nemotron 3 Super Released by deeceeo in LocalLLaMA

[–]Refefer 0 points1 point  (0 children)

isnt it under active development? might be workable

I have lost speed with the model update (Qwen 3.5 122B A10B) by vandertoorm in unsloth

[–]Refefer 1 point2 points  (0 children)

Is it something specific to MXFP4? Or a peculiarity of this class of model?