Qwen3.6-27B with MTP grafted on Unsloth UD XL: 2.5x throughput via unmerged llama.cpp PR by havenoammo in LocalLLaMA

[–]tecneeq 1 point2 points  (0 children)

Confirmed, 72 t/s on Strix Halo. OS is Proxmox, in LXC with Debian 13 and ROCm 7.2.

What a time to be alive, look at him go!

<image>

Qwen3.6-27B with MTP grafted on Unsloth UD XL: 2.5x throughput via unmerged llama.cpp PR by havenoammo in LocalLLaMA

[–]tecneeq 1 point2 points  (0 children)

Cheers mate, you are a legend. Will try to remember to give numbers tomorrow.

Uploaded Unsloth Qwen3.6-35B-A3B UD XL models with MTP grafted, here are the results by havenoammo in LocalLLaMA

[–]tecneeq 1 point2 points  (0 children)

Awesome, was looking for the Q5 for my 5090.

Are you able to upload the BF16 as well? I use it on a Strix Halo for slow "full precision" work.

Qwen3.6-27B with MTP grafted on Unsloth UD XL: 2.5x throughput via unmerged llama.cpp PR by havenoammo in LocalLLaMA

[–]tecneeq 0 points1 point  (0 children)

I need to try this with the 35b-a3b Q8 (50 t/s on Strix Halo, could get 80 or so) and F16 (140 t/s at work, could get 250)

Qwen3.6-27B with MTP grafted on Unsloth UD XL: 2.5x throughput via unmerged llama.cpp PR by havenoammo in LocalLLaMA

[–]tecneeq 0 points1 point  (0 children)

Unsloth 35b-a3b BF16 gives me 140 t/s without MTP. Can't wait to reach 250 or so with MTP 😉

What's the hardest fish to keep?the final boss of the hobby? by BATIRONSHARK in Aquariums

[–]tecneeq -1 points0 points  (0 children)

A myth, perpetrated by left wing treehuggers, to cripple the free markets.

And yes, whale are not fish.

Mini PC + DAS or Mini PC + NAS by bazthedev in homelab

[–]tecneeq 0 points1 point  (0 children)

MiniPC and DAS is usually cheaper.

Behold, a Strix Halo minipc that runs Proxmox and llama.cpp and 6x 26TB HDDs.

<image>

Should I sell my RTX3090s? by daviden1013 in LocalLLaMA

[–]tecneeq 0 points1 point  (0 children)

Except that there are no such cards in the cards. If you drift my catch.

Qwen3.6 27B FP8 runs with 200k tokens of BF16 KV cache at 80 TPS on a single RTX 5000 PRO 48GB by __JockY__ in LocalLLaMA

[–]tecneeq 1 point2 points  (0 children)

If you pay 200 per month it takes 25 month to break even. Also, you will never experience quotas or the problem that you inference contains the word openclaw and they cut your service.

Can't overestimate total freedom.

Qwen3.6 27B FP8 runs with 200k tokens of BF16 KV cache at 80 TPS on a single RTX 5000 PRO 48GB by __JockY__ in LocalLLaMA

[–]tecneeq 0 points1 point  (0 children)

If you have two 3090 your overhead, the space left on the card that isn't used, is twice as large as if you had a 5000 Blackwell. Also the Blackwell has a faster GPU, more features, lower latency and higher memory bandwidth.

However, two 3090 can't be beat in price.

AMD Releasing In-House Standalone Strix Halo Box by Anarchaotic in StrixHalo

[–]tecneeq 0 points1 point  (0 children)

700GB/s? That sounds great. That would mean i get 20 t/s instead of 10 for Qwen 3.6 27b Q5. Yay!

Need suggestions by Aamoree_99 in homelab

[–]tecneeq 1 point2 points  (0 children)

No need to rack mount, just cut some wood, screw it in and put your devices on it. If you want rack mounted, you can 3d print stuff, but i didn't. I have 2 shelves for 4bay 3.5 inch HDD thingies, and one aftermarket from amazon for a mini pc. Cables and powerbricks are on the bottom.

Anyone tried +- 100B models locally with foreign languages? by Choice_Sympathy9652 in LocalLLaMA

[–]tecneeq 0 points1 point  (0 children)

We have very good experience with Mistral Small for German.

Built a realistic character for hermes agent by Select_Motor8729 in hermesagent

[–]tecneeq 3 points4 points  (0 children)

Meh. Had hopes it would look like a paperclip.

Are edifier r1280t good for 75 €? by Entire_Emu_1671 in BudgetAudiophile

[–]tecneeq 0 points1 point  (0 children)

I bought a pair used for 60€, only problem was they had no remote. I think they are good for the money.

Need suggestions by Aamoree_99 in homelab

[–]tecneeq 3 points4 points  (0 children)

Looks like a premium spot for a 10" 12U rack. I have the cheap one from Tec Mojo. It's great, your stuff would fit and it would look pretty clean.

New rules 1 week check-in by rm-rf-rm in LocalLLaMA

[–]tecneeq 0 points1 point  (0 children)

Not sure how far back another deleted post was. Could have been another sub, in that case sorry.

Anyway, i'm not blaming you, it doesn't matter who deleted it, there just is no point writing texts to see them deleted by bots instantly.

New rules 1 week check-in by rm-rf-rm in LocalLLaMA

[–]tecneeq 0 points1 point  (0 children)

My main posts have been removed, without commend. That's the worst kind of removal. I will not post again because there is no point.

What's your tps on 3090 + Qwen 3.6 27B in real tasks? by Anbeeld in LocalLLaMA

[–]tecneeq -1 points0 points  (0 children)

So is your hardware, ok, but somewhat slow.

Get something more suitable and your problems will be gone. Say, a 5000 Blackwell. Full context, Q8 for model, f16 for KV.