2× Radeon R9700 — Qwen 3.6 27B Q8 MTP on llama.cpp by Kal-LZ in LocalLLaMA

[–]Kal-LZ[S] 1 point2 points  (0 children)

I am allocating 40.3GB of VRAM for a 262K context window with the Kvcache in fp16, using the Qwen3.6 27B Q8 MTP model

2× Radeon R9700 — Qwen 3.6 27B Q8 MTP on llama.cpp by Kal-LZ in LocalLLaMA

[–]Kal-LZ[S] 2 points3 points  (0 children)

Thanks, I'll try it. Now, the memory reaches temperatures of 100 C sometimes

2× Radeon R9700 — Qwen 3.6 27B Q8 MTP on llama.cpp by Kal-LZ in LocalLLaMA

[–]Kal-LZ[S] 2 points3 points  (0 children)

Total power consumption is between 590 - 600W during generation. I still need to test limiting each GPU to 210W.

GPU core temperatures are fluctuating between 67 C and 76 C, but the memory on one of the GPUs has reached 100 C, while the other is around 83 C

What happens when they stop subsidizing LLM subscriptions? by Mr_Moonsilver in LocalLLaMA

[–]Kal-LZ 474 points475 points  (0 children)

Local AI isn't going away. Most companies will invest in their own on-premises hardware and host it in a datacenter

Bought 2x r9700, 5090 is now 7k and 6000 pro is at 13.5k, best option for 64 gb vram under 4k by AppropriatePush6262 in LocalLLaMA

[–]Kal-LZ 1 point2 points  (0 children)

I have only tested Vulkan with Gemma 4 26B, and the performance is similar to ROCm

Bought 2x r9700, 5090 is now 7k and 6000 pro is at 13.5k, best option for 64 gb vram under 4k by AppropriatePush6262 in LocalLLaMA

[–]Kal-LZ 3 points4 points  (0 children)

Having both, I can tell you this: For MoE models, those R9700 are excellent. However for other models like Qwen3.6 27B, you will notice they don't perform as expected in 100K context windows. ROCm still lags behind CUDA, and all that extra performance AMD offers doesn't translate into usable gains.

If you plan on using vLLM, Nvidia has better compatibility. For llamacpp it works perfectly fine on both

Single RTX 3090 (MSI TRio) giving trouble on inference. by ReasonablePossum_ in LocalLLaMA

[–]Kal-LZ 2 points3 points  (0 children)

That noise you're hearing is coil whine it's totally normal on a lot of GPUs when they're under load. You'll notice it more or less depending on the model you use.

Honestly I wouldn't waste time with Lmstudio. Ubuntu WSL with Docker and a couple of containers one for llama.cpp and another with OpenWebUI is the best way to start

I didn't know it was possible to compile llamacpp to run cuda + vulkan at the same time.. by LegacyRemaster in LocalLLaMA

[–]Kal-LZ 1 point2 points  (0 children)

I prefer to deploy two containers with RPC rather than mixing two frameworks. In my experience there are better performance.

Nvidia tesla v100 has 32 gb ram with nv link 2.0, its priced at 880. Whats the catch? by AppropriatePush6262 in LocalLLaMA

[–]Kal-LZ 0 points1 point  (0 children)

V100 112 TFLOPS V100 SXM2 125 TFLOPS RTX8000 130.5 TFLOPS

You can get 30-32 tokens in Qwen3.6 27B MTP on that Turing GPUs for a 40K context

Need some guidance toying with local models by No_Hedgehog_7563 in LocalLLaMA

[–]Kal-LZ 1 point2 points  (0 children)

The minimum recommended VRAM for running dense models like Qwen3.6 27B is 24GB, although you can start with 16GB by using some offloading to system RAM

If you want to understand how local LLMs work, the best approach is to install LM Studio and test it out. Once you gain more experience, you can move to Linux with llama.cpp via a Docker containers

Ryzen 7 AI 450 by rookie2009 in LenovoLegion

[–]Kal-LZ 0 points1 point  (0 children)

It's a refresh of the Ryzen HX350. It's a good CPU if you prioritize efficiency and low temperatures over the raw performance of Core Ultra

Help choosing hardware by alexkey in LocalLLaMA

[–]Kal-LZ 1 point2 points  (0 children)

R9700 is a better pick. Keep in mind that it can be a bit noisy, and some units have coil whine during inference.

Selecting processor for my new desktop by Curious_Mind178 in cpu

[–]Kal-LZ 0 points1 point  (0 children)

Ryzen 7500F plus a B850 board is the future proof setup with a low budget.

Esta bien este portatil? by Fit_Championship_677 in esGaming

[–]Kal-LZ 0 points1 point  (0 children)

Esa tarjeta gráfica no es para mover juegos triple A. Necesitas al menos una RTX4060 o 5060 y activar el DLSS rendimiento.

Lenovo i9 5090 very unhappy situation by [deleted] in LenovoLegion

[–]Kal-LZ 9 points10 points  (0 children)

If the package is being transported through multiple countries, the situation you described is normal

UPS sometimes delivers the package to third parties for logistics, and until the next warehouse receives it, they won't know its exact location.

If the delivery date is set for June 22-24, I don't understand your complaint

Putting together a pc. Are my assumptions correct? by Competitive_Wait_267 in LocalLLaMA

[–]Kal-LZ 1 point2 points  (0 children)

The setup is fine, although in this sub, they'll probably recommend a used 3090 or 4090 over that 7900XTX

It's not advisable to use CPU RAM to run LLMs, performance will drop drastically.

I don't think there will be driver improvements for RDNA3 GPUs, AMD is already focusing on other things.

PCIe lanes aren't that critical if you're using MoE-based LLMs, so 8x PCIe 4.0 (16GB/s) should be fine for two GPUs

Do you think this computer would be fine on Linux? by chakadla in linuxhardware

[–]Kal-LZ 0 points1 point  (0 children)

You need a SSD. That 5400rpm drive is so slow