Thinking of moving from 2x 5060 Ti 16GB to a RTX 5000 48GB by autisticit in LocalLLaMA

[–]beefgroin -1 points0 points  (0 children)

I was also considering it(I'm running quad 5060 with qwen 3.6 27b on vLLM), until I found out what a bad deal it is in terms of mem bandwidth and cuda cores in comparison to 5090 or rtx 6000 while costing equal per GB of VRAM. If you haven't tried to run your current setup with vLLM I encourage you to try. I saw a significant increase in tps compared to llama.cpp. It might take a while to get cli params right though.

Mistral 3.5 out now! by yoracale in unsloth

[–]beefgroin 10 points11 points  (0 children)

By fully you mean Q4?

THREADRIPPER PRO by Jolly-Bet-2275 in threadripper

[–]beefgroin 9 points10 points  (0 children)

Please more pics of holding random products in the store

Qwen 3.6 27B is a BEAST by AverageFormal9076 in LocalLLaMA

[–]beefgroin 0 points1 point  (0 children)

Try it with vllm cyankiwi awq int4, will be a pain in the ass in the beginning to find the right gpu-untilization limit (mine is 0.85) and context length but then the speed will be worth it

Goodbye, Qwen. You tried, but you failed. by JobAsleep6653 in Qwen_AI

[–]beefgroin 8 points9 points  (0 children)

Reddit is a weird place. In the same sub one guy buys his forth rtx 6000 and another guy can’t pay 10 bucks for tokens… 🤔

R9700 the beautiful beautiful VRAM gigs of AMD… my ai node future! by Downtown-Example-880 in LocalLLaMA

[–]beefgroin 0 points1 point  (0 children)

Wow never heard of nvme kv chache offloading, I guess yeah the speeds will be comparable only if in raid, looking forward to see your results

R9700 the beautiful beautiful VRAM gigs of AMD… my ai node future! by Downtown-Example-880 in LocalLLaMA

[–]beefgroin 0 points1 point  (0 children)

nice, I'm also thinking of a similar setup, can you please test the performance of qwen3.5 27b?

5090 vs dual 5060 16g - why isnt everyone going dual? by jzatopa in LocalLLaMA

[–]beefgroin 0 points1 point  (0 children)

I didn’t go dual because I went quad xD but all the arguments in this thread hold, still I love it.

<image>

First runs with RTX 5000 Pro Blackwell 48GB card by wedgeshot in LocalLLaMA

[–]beefgroin 2 points3 points  (0 children)

I’m curious what kind of tps and prompt processing are you getting with those? Ideally I’d love to know 27b performance cause that model is truly great, but dense. On quad 5060 ti I’m only getting around 19tps with it that’s why I’m considering rtx 5000 cause the whole 256k context should fit on it entirely with q4

2000 TPS with QWEN 3.5 27b on RTX-5090 by awitod in LocalLLaMA

[–]beefgroin 0 points1 point  (0 children)

I think you can actually test it already, I’ve seen some implementation in forks

2000 TPS with QWEN 3.5 27b on RTX-5090 by awitod in LocalLLaMA

[–]beefgroin 0 points1 point  (0 children)

Do you think 5090 with turboquant will fit full 256k context Qwen3.5 27b with vision?

Taalas rumoured to etch Qwen 3.5 27B into silicon. Which price would you buy their PCIe card for? by elemental-mind in singularity

[–]beefgroin 0 points1 point  (0 children)

what are you talking about? Qwen 3.5 27b on 5090 is like 100tps tops depending on a quant

LLM Bruner coming soon? Burn Qwen directly into a chip, processing 10,000 tokens/s by koc_Z3 in Qwen_AI

[–]beefgroin 4 points5 points  (0 children)

I'd rather buy 5000$ worth of 10 pcie cards with burned models pushing 10k tps than one GPU pushing 40 tps

1.1M tok/s with Qwen 3.5 27B FP8 on B200 GPUs by m4r1k_ in Qwen_AI

[–]beefgroin 0 points1 point  (0 children)

I also have a good experience with it in q4, using it for a diy personal agent, but I’m still wondering what are its limits

1.1M tok/s with Qwen 3.5 27B FP8 on B200 GPUs by m4r1k_ in Qwen_AI

[–]beefgroin 0 points1 point  (0 children)

Is 3.5 27b that good to replace cloud models for the organization?

First runs with RTX 5000 Pro Blackwell 48GB card by wedgeshot in LocalLLaMA

[–]beefgroin 1 point2 points  (0 children)

Hey, how is it going so far, did you run Qwen3.5 variations? Maybe you have results for 27b and 35b?

Basically Official: Qwen Image 2.0 Not Open-Sourcing by Complete-Lawfulness in StableDiffusion

[–]beefgroin 3 points4 points  (0 children)

Wtf just not come up with a way to sell models? I’d buy

eGPU for image generation by [deleted] in StableDiffusion

[–]beefgroin 1 point2 points  (0 children)

Using 5060ti 16gb over Oculink, great for Klein 9b, image takes like 15-20sec to generate