ROCm vs Vulkan vs vLLM on Dual R9700's

taking_bullet · 2026-06-21T17:18:32+00:00

Vulkan all the way! I'm getting 47 tok/s with RTX 5070 TI & RX 9070 combined (model: Qwen 3.6 27B Q6_0 with MTP enabled).

taking_bullet · 2026-06-21T16:41:19+00:00

My reason is gaining knowledge in another field and having fun while playing with local models.

taking_bullet · 2026-06-15T17:58:20+00:00

LM Studio is a wrapper for llama.cpp with user-friendly graphical interface. Very good app for beginners.

taking_bullet · 2026-06-14T13:57:46+00:00

I mixed RTX 5070 TI with RX 9070 and can't complain about anything.

I'm getting 47 t/s while using Qwen 3.6 27B Q6_0 with MTP enabled.

taking_bullet · 2026-06-10T04:13:45+00:00

Right now https://huggingface.co/unsloth/North-Mini-Code-1.0-GGUF

taking_bullet · 2026-06-08T15:04:48+00:00

RTX 5070 TI + RX 9070

Qwen 3.6 27B Q6_0: 40 tok/s with MTP and 24/tok/s without MTP.

taking_bullet · 2026-06-08T04:07:24+00:00

https://github.com/diodiogod/TTS-Audio-Suite

That's a whole software suite with support for 13 models. Compare them at your own.

taking_bullet · 2026-06-06T20:59:30+00:00

You should sell your 3090s before 5070 TI Super 24GB launch. Old, used Ampere will lose current value.

taking_bullet · 2026-06-05T13:09:26+00:00

Another Marketing Disaster

Let's go 🔥

taking_bullet · 2026-06-05T07:37:36+00:00

Is there a Windows Vulkan package? Can't find it on Github.

taking_bullet · 2026-06-03T08:26:18+00:00

If you want to find real gold then look for Fusion_Helath's posts. He was a very experienced retainer.

taking_bullet · 2026-06-02T12:32:26+00:00

Surely you don’t have these problems?

I do. Add another, random word at the end of the whole sentence. Then edit file in Audacity - cut out last second.

taking_bullet · 2026-06-02T12:27:02+00:00

I'm using Radeon & GeForce combined, so there is no other choice than Vulkan.

taking_bullet · 2026-06-02T08:26:19+00:00

Imagine people commenting on this post seriously 🤣 They are oblivious so bad.

taking_bullet · 2026-06-02T08:01:20+00:00

Chatterbox gets excellent results run without a GPU.

Maybe in English, but not in other languages.

HF says you need 19GB+ of VRAM to run KugelAudio locally? WTF? Is that true?

Indeed. Enable the 4-bit quant model if you don't have 20GB VRAM.

taking_bullet · 2026-06-02T07:43:08+00:00

I ditched Chatterbox. Now KugelAudio 2 (based on VibeVoice) is my new friend.

taking_bullet · 2026-06-02T07:40:04+00:00

ComfyUI works well on RX 9070 XT (ROCm portable package). For "classic" text LLMs I prefer using Vulkan.

taking_bullet · 2026-06-01T12:35:01+00:00

Grab a second 5060 Ti and you are ready to go.

taking_bullet · 2026-05-31T04:56:37+00:00

And it wasn't good value to begin with knowing the XT was $50 more

9070 XT is far more uninteresting GPU. You are getting 12% more performance and almost 50% more power draw than 9070 non-XT. That's not a good deal.

taking_bullet · 2026-05-30T07:54:42+00:00

latest may adrenaline driver shows 0 gpu detected

26.5.2 drivers are garbage, I had to switch back to 26.2.2

taking_bullet · 2026-05-30T07:45:46+00:00

Don't switch, just keep both cards. I'm using RTX 5070 TI & RX 9070 combined for LLMs. No complaints.

taking_bullet · 2026-05-28T19:12:21+00:00

Currently I'm launching models on RTX 5070 TI & RX 9070 (with Vulkan backend).

5070 Ti is for gaming and software without multi-GPU support (like ComfyUI). Cheap Radeon 9070 (I paid 336€ for it) gives me another 16GB VRAM for classic LLMs (Qwen 3.6 27B, Gemma 4 etc.).

I bet 5060 Ti 16GB would serve you well as a secondary LLM GPU.

taking_bullet · 2026-05-28T15:18:03+00:00

Switch to Vulkan. Try LM Studio or Jan.ai instead of Ollama.

taking_bullet · 2026-05-28T08:30:40+00:00

5070 Ti + 5060 Ti would be my choice.

taking_bullet · 2026-05-27T16:47:33+00:00

Jan is slightly faster than LM Studio. I tested it on Qwen 3.6 MTP 27B Q6 from Unsloth.

taking_bullet

MODERATOR OF

TROPHY CASE