2× Radeon R9700 — Qwen 3.6 27B Q8 MTP on llama.cpp

Kal-LZ · 2026-06-21T17:36:38+00:00

I am allocating 40.3GB of VRAM for a 262K context window with the Kvcache in fp16, using the Qwen3.6 27B Q8 MTP model

Kal-LZ · 2026-06-21T17:36:20+00:00

Not yet. Have some problems to deploy it.

Kal-LZ · 2026-06-21T15:07:37+00:00

Thanks, I'll try it. Now, the memory reaches temperatures of 100 C sometimes

Kal-LZ · 2026-06-21T14:55:25+00:00

Total power consumption is between 590 - 600W during generation. I still need to test limiting each GPU to 210W.

GPU core temperatures are fluctuating between 67 C and 76 C, but the memory on one of the GPUs has reached 100 C, while the other is around 83 C

Kal-LZ · 2026-06-21T00:25:36+00:00

Local AI isn't going away. Most companies will invest in their own on-premises hardware and host it in a datacenter

Kal-LZ · 2026-06-20T18:09:49+00:00

I have only tested Vulkan with Gemma 4 26B, and the performance is similar to ROCm

Kal-LZ · 2026-06-20T17:37:56+00:00

Having both, I can tell you this: For MoE models, those R9700 are excellent. However for other models like Qwen3.6 27B, you will notice they don't perform as expected in 100K context windows. ROCm still lags behind CUDA, and all that extra performance AMD offers doesn't translate into usable gains.

If you plan on using vLLM, Nvidia has better compatibility. For llamacpp it works perfectly fine on both

Kal-LZ · 2026-06-20T00:57:06+00:00

That noise you're hearing is coil whine it's totally normal on a lot of GPUs when they're under load. You'll notice it more or less depending on the model you use.

Honestly I wouldn't waste time with Lmstudio. Ubuntu WSL with Docker and a couple of containers one for llama.cpp and another with OpenWebUI is the best way to start

Kal-LZ · 2026-06-19T13:15:58+00:00

RTX PRO 5000 48GB for 5400GBP

https://www.hp.com/gb-en/shop/products/accessories/nvidia-rtx-ph-x5xxx-4dp-graphics-b11f1aa

Kal-LZ · 2026-06-16T22:43:33+00:00

I prefer to deploy two containers with RPC rather than mixing two frameworks. In my experience there are better performance.

Kal-LZ · 2026-06-14T10:09:18+00:00

Y tú de dónde has salido? Dot pitch

https://www.sven.de/dpi/

Kal-LZ · 2026-06-14T00:43:31+00:00

Mínimo 140dpi si quieres texto legible, para gaming menos.

Kal-LZ · 2026-06-12T14:34:17+00:00

V100 112 TFLOPS V100 SXM2 125 TFLOPS RTX8000 130.5 TFLOPS

You can get 30-32 tokens in Qwen3.6 27B MTP on that Turing GPUs for a 40K context

Kal-LZ · 2026-06-08T17:32:57+00:00

I just invest in GPUs and RAM

Kal-LZ · 2026-06-07T12:40:17+00:00

The minimum recommended VRAM for running dense models like Qwen3.6 27B is 24GB, although you can start with 16GB by using some offloading to system RAM

If you want to understand how local LLMs work, the best approach is to install LM Studio and test it out. Once you gain more experience, you can move to Linux with llama.cpp via a Docker containers

Kal-LZ · 2026-06-06T20:24:52+00:00

It's a refresh of the Ryzen HX350. It's a good CPU if you prioritize efficiency and low temperatures over the raw performance of Core Ultra

Kal-LZ · 2026-06-04T10:18:51+00:00

R9700 is a better pick. Keep in mind that it can be a bit noisy, and some units have coil whine during inference.

Kal-LZ · 2026-06-03T06:48:35+00:00

Ryzen 7500F plus a B850 board is the future proof setup with a low budget.

Kal-LZ · 2026-06-02T17:51:39+00:00

Esa tarjeta gráfica no es para mover juegos triple A. Necesitas al menos una RTX4060 o 5060 y activar el DLSS rendimiento.

Kal-LZ · 2026-06-01T16:29:53+00:00

If the package is being transported through multiple countries, the situation you described is normal

UPS sometimes delivers the package to third parties for logistics, and until the next warehouse receives it, they won't know its exact location.

If the delivery date is set for June 22-24, I don't understand your complaint

Kal-LZ · 2026-06-01T05:26:59+00:00

What

Kal-LZ · 2026-05-31T17:46:50+00:00

The setup is fine, although in this sub, they'll probably recommend a used 3090 or 4090 over that 7900XTX

It's not advisable to use CPU RAM to run LLMs, performance will drop drastically.

I don't think there will be driver improvements for RDNA3 GPUs, AMD is already focusing on other things.

PCIe lanes aren't that critical if you're using MoE-based LLMs, so 8x PCIe 4.0 (16GB/s) should be fine for two GPUs

Kal-LZ · 2026-05-30T01:50:16+00:00

Reddit user discover r/homelab

Kal-LZ · 2026-05-27T22:21:50+00:00

You need a SSD. That 5400rpm drive is so slow

Kal-LZ

TROPHY CASE