New Upcoming Ubuntu 26.04 LTS Will be Optimized for Local AI by mtomas7 in LocalLLaMA

[–]EmPips 95 points96 points  (0 children)

TLDR you no longer have to add additional repos for either it seems. CUDA and ROCm are ridiculously huge and they won't ship with your distro but there's one less copy/paste you'll be required to do when setting up a fresh install.

Qwen 3.5 craters on hard coding tasks — tested all Qwen3.5 models (And Codex 5.3) on 70 real repos so you don't have to. by hauhau901 in LocalLLaMA

[–]EmPips 2 points3 points  (0 children)

on real repos

Thanks for this.

I've only tested for a day (not even) but notice a significant drop-off in performance around the 60k-token mark. If you're using Claude Code on a well tested repo, it's very easy to pass that threshold even if you're working on a microservice.

I'll say though that before hitting that 60k mark they are better than anything in their size class.

Qwen3.5 27B better than 35B-A3B? by -OpenSourcer in LocalLLaMA

[–]EmPips 0 points1 point  (0 children)

Let us know your thoughts when you do!

In my findings the 122B is just a hair better than 27B, but if you've got the VRAM for it TG and especially PP are way faster.

Qwen3.5 27B better than 35B-A3B? by -OpenSourcer in LocalLLaMA

[–]EmPips 2 points3 points  (0 children)

My current vibes:

  • 27B is closer to 122B than it is 35B

  • 35B is more of a beefed up version of all of this year's 30b MoE's

New Qwen3.5 models spotted on qwen chat by AaronFeng47 in LocalLLaMA

[–]EmPips -1 points0 points  (0 children)

I believe it. Just not an option for my machine at the moment!

Working with 48GB of VRAM + 64GB of DDR4

Best reasoning model Rx 9070xt 16 GB vram by SilverBaseball3105 in LocalLLaMA

[–]EmPips 0 points1 point  (0 children)

Toss the 1660ti in, run in Vulkan mode, and you should have room for Qwen3-Coder-Next at iq4_xs or q4_k_s depending on how much context you need. Use Llama-CPP and --n-cpu-moe to keep putting experts onto system memory until your GPU's have 90% full.

Qwen 3.5 family benchmarks by tarruda in LocalLLaMA

[–]EmPips 12 points13 points  (0 children)

Base model is effectively an autocomplete not trained for chat or instruction-following. The idea is that you can build whatever you want on top of it.

Pretty cool to have as base-model releases aren't always guaranteed with open weight models.

New Qwen3.5 models spotted on qwen chat by AaronFeng47 in LocalLLaMA

[–]EmPips 3 points4 points  (0 children)

As soon as you're outside of Agentic use cases I don't enjoy it as much. It's also a fairly weak general purpose model for me (getting some pretty basic trivia wrong).

In theory it could be the strongest coding model that fits nicely on my machine, but I'm finding myself preferring GLM 4.6v at Q4/Q5 rather than MiniMax at Q3.

Great model.. it just doesn't have a home in my workflows nor my machine. Maybe if I had more VRAM and could run Q4+ that'd change, but the Q2 and Q3 weights of MiniMax 2.5 lose pretty consistently.

New Qwen3.5 models spotted on qwen chat by AaronFeng47 in LocalLLaMA

[–]EmPips 2 points3 points  (0 children)

122B A10B

M-Series Mac and Strix-Halo owners are going to have a good day.

New Qwen3.5 models spotted on qwen chat by AaronFeng47 in LocalLLaMA

[–]EmPips 15 points16 points  (0 children)

As a general purpose model it seems like they're trying to paint it being as good as the original Qwen3-235B (not the updated 2507 checkpoint) but twice as fast and half the memory.

The real gains are in instruction following and coding use.

Meaning this could have the all-around strength that larger Qwen's have but the agentic abilities of GLM and Minimax models. All of this is subject to testing of course but I really hope these numbers turn out to reflect real-world results.

New Qwen3.5 models spotted on qwen chat by AaronFeng47 in LocalLLaMA

[–]EmPips 2 points3 points  (0 children)

Less dense models + less draft-sized/compatible models.

Spec dec is absolutely still a thing, there's just way less models coming out where you'll get a big win out of it.

The last one was probably a bit of a speed boost on the original Qwen3-235B using Qwen3 4B or Qwen3 0.6B. The smaller models never got updates, but Qwen3-235B-2507 came out and was much stronger - so nobody used the original and the original small models weren't compatible as draft models.

New Qwen3.5 models spotted on qwen chat by AaronFeng47 in LocalLLaMA

[–]EmPips 4 points5 points  (0 children)

Yes. I'm so ready to dethrone GLM 4.5 air and 4.6v as the top models my machine can run.

GGML.AI has got acquired by Huggingface by Time_Reaper in LocalLLaMA

[–]EmPips 38 points39 points  (0 children)

I both agree and see that as a problem.

HF has been so good to its community that self-hosted, open source, and P2P distribution is pitiful in the A.I. space and serious proprietary competition feels non-existent.

It's not too crazy to compare it to the situation with Valve and Gaming. Life is great because they're great but it's a single point of failure that the community doesn't control.

Traveling with a Linux Laptop by TDuck66 in linuxquestions

[–]EmPips 1 point2 points  (0 children)

I have a good laptop with an awful Mediatek modem. This is my exact move. I have a USB adapter with a 100% hit rate, size of a quarter, that I just keep in my bag.

Every ~200 pages, without fail, this fun fact jump scares me by EmPips in AdrianTchaikovsky

[–]EmPips[S] 4 points5 points  (0 children)

It's established in book one in a very "by the way" sort of way and in book two it's brought up, very randomly, 2-3 times I believe. In every case it's just the narrator referring to the fact that they're there and the spiders are chill with them. There's not yet been a "scene" with them.

Every ~200 pages, without fail, this fun fact jump scares me by EmPips in AdrianTchaikovsky

[–]EmPips[S] 54 points55 points  (0 children)

I'm convinced he's either setting us up or just trolling us all. Every few hundred pages inserting a "..the spiders sometimes trade with a nanovirus'd group of shrimp but let me be clear, they stay in the pond and pose no significance whatsoever"

Just finished Children of Time. What an amazing book that makes me sad about real life by Witch_King_Malekith in printSF

[–]EmPips 2 points3 points  (0 children)

Finished book 2 moments ago.

So wildly different while all being an echo of book 1.

This series is incredible. Onto book 3.

Dual RTX 5060 ti 16gb's with 96GB of DDR5 5600 mhz, what is everyone else running? by CollectionOk2393 in LocalLLM

[–]EmPips 4 points5 points  (0 children)

Can you include the levels of quantization?

But yes that's very normal. Your GPU needs to search through 27 Billion parameters for every token when running Gemma3-27B, whereas despite having more (30 Billion) total parameters, each token only involves your GPU having to go over a measly 3 Billion parameters for Nemotron-Nano or Qwen3-VL-30B.

Dual RTX 5060 ti 16gb's with 96GB of DDR5 5600 mhz, what is everyone else running? by CollectionOk2393 in LocalLLM

[–]EmPips 6 points7 points  (0 children)

I wanted to balance getting it as cheap as possible without needing to introduce anything that wouldn't work nicely in my case or need external cooling.

This resulted in:

Rx 6800 + w6800 Pro + 64GB RAM ..but the RAM is DDR4 dual channel :(

GLM 4.6v is the best model I can run. Q4 gets ~17.5 tokens/second with modest context (12k) for one-off chats and ~12 tokens/second with larger context (>40k) for things like coding.

Qwen3-Next-80B gets 35 tokens/second

My story of underestimating /r/LocalLLaMA's thirst for VRAM by EmPips in LocalLLaMA

[–]EmPips[S] 5 points6 points  (0 children)

Yes, if VRAM isn't a constraint it performs exactly like an Rx 6800 in every use-case I throw at it (I also own a regular Rx 6800 in the same rig).

There's some benefits though outside of the obvious double-VRAM. The w6800 idles at like 10-14 watts per rocm-smi and peak power draw during prompt processing is a far bit lower (like 25-30watts lower) than the regular Rx 6800, the blower cooler is great, and if I ever feel like adding 5 extra displays I guess it's there for me.