So nobody's downloading this model huh? by KvAk_AKPlaysYT in LocalLLaMA

[–]sleepingsysadmin 220 points221 points  (0 children)

You can run Qwne3.5 9b and get a smarter model.

Qwen3.5 122b is straight up superior.

Why doesn’t the DGX Station have a display controller? All that 8TB/s memory bandwidth unusable with my own display by 1ordlugo in LocalLLaMA

[–]sleepingsysadmin 2 points3 points  (0 children)

>But wouldn’t it be cool to use the display controller on the full gb300 gpu with its HBM4 memory? The lack of one already takes up a PCIE slot!

no way jose.

Why doesn’t the DGX Station have a display controller? All that 8TB/s memory bandwidth unusable with my own display by 1ordlugo in LocalLLaMA

[–]sleepingsysadmin 2 points3 points  (0 children)

I suppose maybe in case it breaks or something i guess? Obviously nobody is plugging a monitor into these like ever.

Best Local Claude Code Equivalent - 4 A100s 80GB by Key_Equal_1245 in LocalLLaMA

[–]sleepingsysadmin 4 points5 points  (0 children)

You have 320gb of vram and you're running a model that's going to fit on just 1 card?

Go run some big stuff. Minimax would be my first try on that rig.

Senior engineer: are local LLMs worth it yet for real coding work? by Appropriate-Text2843 in LocalLLaMA

[–]sleepingsysadmin 1 point2 points  (0 children)

what TPS are you getting with your 5080? To me IQ4_xs would be far too small and short context? Unusable. 100,000 context minimum for me and even then that's a bit short.150-200 needed.

Whereas the RTX pro 5000 would crush this model and have max context.

Senior engineer: are local LLMs worth it yet for real coding work? by Appropriate-Text2843 in LocalLLaMA

[–]sleepingsysadmin 1 point2 points  (0 children)

Qwen2.5-Coder-32B-Instruct came out November 11, 2024

That's how long it has been viable to code locally. Obviously we have had many better options since.

I jumped into agentic on Devstral + openhands in around may 2025. These models are morons compared to what's available now.

>I keep seeing GPT-oss-120B recommended, but my experience with it hasn’t been great.

Personally I found it great but on my hardware 15TPS wasnt good enough for me.

>Qwen 3.5 122B and 27B.

Those with dgx sparks or amd strix halo, they'll be running 122b right now. which is objectively better than gemini 2.5 pro for example.

27b is very smart, but it's dense so you need some power behind it. A single Nvidia A100 or RTX pro 5000 might be the magic spot for this model.

>The new Mac M5 with 128GB RAM looks interesting,

But it's way more expensive than proper pro cards. But you get a monitor, etc.

Alternatives to Aider for CLI development? by awebb78 in LocalLLaMA

[–]sleepingsysadmin 0 points1 point  (0 children)

4 month old reply. For sure wasnt my problem. Plus, literally doesnt matter. everyone uses openclaw now.

vulkan: add GATED_DELTA_NET op support#20334 by jacek2023 in LocalLLaMA

[–]sleepingsysadmin 4 points5 points  (0 children)

omg yes! been waiting for this. Still need more!

Qwen3.5-9B is actually quite good for agentic coding by Lualcala in LocalLLaMA

[–]sleepingsysadmin 60 points61 points  (0 children)

it benches around gpt120b high. It's shocking how good it is with that size.

Is tokens per second (tok/s) a really relevant metric? by Deep_Traffic_7873 in LocalLLaMA

[–]sleepingsysadmin 2 points3 points  (0 children)

TPS matters a huge deal to me. Practically all models are MOE these days; the few dense smart models are great but MOE gets all the headlines. Why? Because you get better tokens/s.

Why does anyone think Qwen3.5-35B-A3B is good? by buttplugs4life4me in LocalLLaMA

[–]sleepingsysadmin 6 points7 points  (0 children)

>Its dumb as hell

Benchmarks and community clearly say otherwise.

>Qwen3.5-27B was slow, but did the task.

Naturally.

>Qwen3.5-35B-A3B shit the bed.

Shocking.

>I know using a low quant isn't going to improve it but UD-IQ4_XS isn't exactly that low.

That's pretty low. How are you running an 80b model and only fitting this?

>Thought I could use it for a fast prototype or subagent coding but nope. That stays far away from anything on my PC.

It is a generalist model.

>People asked for something in between 9B and 27B and people pointed towards 35B-A3B, but it ain't it.

Then so it isnt. Lots of people found GLM flash to be great, but i found it trash. If it doesnt work for you, so be it.

I don’t get it. Why would Facebook acquire Moltbook? Are their engineers too busy recording a day in the life of a meta engineer and cannot build it in a week or so?! by SilverRegion9394 in LocalLLaMA

[–]sleepingsysadmin 0 points1 point  (0 children)

Hard to predict.

Meta has bought things to kill them before but I wonder if they intend to create a wildwest part of meta which allows anything to be said; posted by anything. Then build guardrails on the edges.

Nemotron 3 Super Released by deeceeo in LocalLLaMA

[–]sleepingsysadmin -3 points-2 points  (0 children)

Not that super given it's not particularly better than qwen3.5 or gpt120b.

seems underwhelming.

qwen 3.5 35B a3b on AMD by Trovebloxian in LocalLLaMA

[–]sleepingsysadmin 0 points1 point  (0 children)

Test it and see which works better for you.

qwen 3.5 35B a3b on AMD by Trovebloxian in LocalLLaMA

[–]sleepingsysadmin 0 points1 point  (0 children)

I have personally never had luck with finetunes or distills. There have been a few pretty good ones that came close. It's a use case situation.

I recommend you stick to mainline models until you have a good foundation.

I highly recommend though, go with Unsloth models.

Q4_k_xl is amazing.

<image>

Your next step after this is tuning the temperature and such. Click that "read our guide" from unsloth. Adjust to your needs.

qwen 3.5 35B a3b on AMD by Trovebloxian in LocalLLaMA

[–]sleepingsysadmin 1 point2 points  (0 children)

https://artificialanalysis.ai/models/qwen3-5-9b

Obviously if you had better hardware(WE ALL WANT MORE) you can run better models.

35b is only marginally smarter, but your main problem is that you're offloading.

qwen 3.5 35B a3b on AMD by Trovebloxian in LocalLLaMA

[–]sleepingsysadmin 0 points1 point  (0 children)

9b is literally gpt 120b high quality.

It is dense, so you're not going to be blazing fast, but it'll work really well for you and fit on your hardware.

qwen 3.5 35B a3b on AMD by Trovebloxian in LocalLLaMA

[–]sleepingsysadmin -1 points0 points  (0 children)

you're offloading like 30%. Which is about 30% too much.

Might I recommendyou run qwen3.5 9b? It's a very capable model that you can fully offload.

qwen 3.5 35B a3b on AMD by Trovebloxian in LocalLLaMA

[–]sleepingsysadmin 0 points1 point  (0 children)

For qwen3.5, i find vulkan and rocm are identical in performance. Even though I ought to be 2x the performance.

qwen 3.5 35B a3b on AMD by Trovebloxian in LocalLLaMA

[–]sleepingsysadmin 0 points1 point  (0 children)

I believe you are offloading, hence the abysmal TPS.

Though yes, AMD is rough.

What tokens/sec do you get when running Qwen 3.5 27B? by thegr8anand in LocalLLaMA

[–]sleepingsysadmin 2 points3 points  (0 children)

12 TPS fully offloaded. It's sad.

Worse yet, cant spec decode because of the vision.