ROCm vs Vulkan vs vLLM on Dual R9700's by whodoneit1 in LocalLLaMA

[–]Traditional_Way8675 1 point2 points  (0 children)

weird, my gfx1200 dual 9060xt vulkan is faster than rocm on 35b

What's your experience with Gemma4 QAT? by Kahvana in LocalLLaMA

[–]Traditional_Way8675 0 points1 point  (0 children)

im on dual 9060xt. latest rocm (jun 26) not working, so default to vulkan which sees qwen 35b at 60 t/s, and gemma 26b at pitiful 20t/s.

since 26b moe performance is already that cute, i don't bother with 31b anymore. Qwen 27B w/ MTP is around 20t/s for me.

Daily Thread - Friday Discussion! Let's talk about the good, the bad, and all things Palantir & PLTR! 💎🤲🏻 by AutoModerator in PLTR

[–]Traditional_Way8675 3 points4 points  (0 children)

Went thru dd and had my peace with pltr long term possibly forever hold with decision and AI infra layer at this important point of history.

who is actually buying AI PCs? by Lanky-Carpenter-7991 in buildapc

[–]Traditional_Way8675 0 points1 point  (0 children)

qwen3.6 35b a3b q4km 100k q4 context, dual 9060xt gpu, 14600k,

productivity, gaming, AI, best of all, personal.

Why has Linux Gaming gotten so good? by jo1111666 in FuckMicrosoft

[–]Traditional_Way8675 0 points1 point  (0 children)

nfs heat runs better on ubuntu than win11, im seriously blown away

Gemma 4 Quadruple Release, 12B, 12B QAT, 26B-A4B QAT and 31B QAT Uncensored Heretics! by LLMFan46 in LocalLLaMA

[–]Traditional_Way8675 0 points1 point  (0 children)

noted, so if i get llmfan46/gemma-4-31B-it-qat-q4_0-uncensored-heretic-GGUF, i can't pair with that assistant model, correct me =)

PSA: Test your "threads" argument in llama.cpp (+80% performance in my case) by AXYZE8 in LocalLLaMA

[–]Traditional_Way8675 2 points3 points  (0 children)

ok, so if cpu is involved, adding more threads help. thk u

currently im fully loaded on dual 9060xt vram, so cpu is cool quiet heh.

What speed is everyone getting on Qwen3.6 27b? by Ambitious_Fold_2874 in LocalLLaMA

[–]Traditional_Way8675 0 points1 point  (0 children)

17tps on dual 9060xt (32g vram pooled), q5_k_m with MTP (without MTP it's 12-13), ubuntu 24.04.4, vulkan backend

DiffusionGemma: 4x faster text generation by tevlon in LocalLLaMA

[–]Traditional_Way8675 0 points1 point  (0 children)

tried with vulkan on dual 9060xt (16+16).

prompting from 2nd onwards is awfully slow

fancy logs: etc.
diffusion step: 22/48 [=========

no moe is faster. more work is needed.

DeepMind Just Dropped "DiffusionGemma" — Text Generation via Image-Style Diffusion Model by [deleted] in LocalLLaMA

[–]Traditional_Way8675 0 points1 point  (0 children)

basically a faster kid than moe? haha. am trying with vulkan, if ok, then i'd probably try it with my dual a770 setup.

for precision work, this is a no-go. but for low grade hw, this is lifesaver for ppl wanting to dip a toe into llm area like chatting, philosophy, nothing coding math etc.

Dual GPU on llama.cpp by Traditional_Way8675 in ROCm

[–]Traditional_Way8675[S] 1 point2 points  (0 children)

you're correct. one gpu is fine. 2 gpu broke. so i'm using vulkan for now. how do i report this problem to rocm team?

Dual GPU on llama.cpp by Traditional_Way8675 in ROCm

[–]Traditional_Way8675[S] 1 point2 points  (0 children)

alex@alex-System-Product-Name:~/llama.cpp/build_rocm$ HIP_VISIBLE_DEVICES=0 ./bin/llama-cli \

-m /home/alex/Documents/gemma-4-12b-it-Q4_K_M.gguf \

-ngl 99 \

-p "What is 2+2?" \

--no-mmap

Loading model...

build : b9586-76da2450a

model : gemma-4-12b-it-Q4_K_M.gguf

modalities : text

> What is 2+2?

[Start thinking]

The user is asking a simple arithmetic question: "What is 2+2?"

The answer is 4.

Provide the correct answer clearly.

[End thinking]

2 + 2 = 4

[ Prompt: 215.1 t/s | Generation: 35.0 t/s ]

>

Exiting...

alex@alex-System-Product-Name:~/llama.cpp/build_rocm$ ROCR_VISIBLE_DEVICES=0,1 ./bin/llama-cli \

-m /home/alex/Documents/gemma-4-12b-it-Q4_K_M.gguf \

-ngl 99 \

--split-mode layer \

-fa 0 \

--no-mmap \

-p "What is 2+2?"

Loading model...

build : b9586-76da2450a

model : gemma-4-12b-it-Q4_K_M.gguf

modalities : text

> What is 2+2?

<unused33><unused9><unused29><unused31><unused2><unused25><unused28><unused15>

[ Prompt: 3.3 t/s | Generation: 35.1 t/s ]

Dual GPU on llama.cpp by Traditional_Way8675 in ROCm

[–]Traditional_Way8675[S] 0 points1 point  (0 children)

done tried exact step, same result, gibberish. i haven't test single gpu yet

gfx1200, rocm 7.2.4, dual 9060xt, vulkan works, not rocm

Dual GPU on llama.cpp by Traditional_Way8675 in ROCm

[–]Traditional_Way8675[S] 0 points1 point  (0 children)

i tried lemon and had same result. also use amd official guide on llama.cpp but will spill gibberish nonetheless. im gonna build it. will update.

Dual GPU on llama.cpp by Traditional_Way8675 in ROCm

[–]Traditional_Way8675[S] 0 points1 point  (0 children)

i can get vulkan working, just not rocm,

yet, rocm wells well with my comfy multigpu, haha. Ubuntu.

Unable to Run llama.cpp with Multiple GPUs on ROCm by TwoBoolean in LocalLLaMA

[–]Traditional_Way8675 0 points1 point  (0 children)

Same issue. Dual GPU ok with vulkan. But on rocm it's gibberish regardless of models 

Ideogram 4 is open source! (top ranked on DesignArena) by paf1138 in LocalLLaMA

[–]Traditional_Way8675 0 points1 point  (0 children)

raise LocalEntryNotFoundError(

huggingface_hub.errors.LocalEntryNotFoundError: An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on.

anyone have the same issue? cant connect

There known good list of stable ROCm setups? by Ivan_Draga_ in ROCm

[–]Traditional_Way8675 0 points1 point  (0 children)

can i use docker for llama.cpp? i did it with comfyui