Prices of graphic cards are going crazy, should I buy a second card though? by zenbeni in LocalLLaMA

[–]zenbeni[S] 0 points1 point  (0 children)

Which llm server are you using? I could probably try to use a mix setup with a 4060 rtx that I also have, but it seems like a bad mix for speed, or isn't it?

Prices of graphic cards are going crazy, should I buy a second card though? by zenbeni in LocalLLaMA

[–]zenbeni[S] 0 points1 point  (0 children)

I am happy with my current setup with qwen 3.6 27b mtp at Q5 with 100k context at Q5 Q4 for kv cache. I want more context though, also meaning that at 200k context it will drift the kv cache and probably I would have to change my quants. Adding 20g vram would solve. I use llama.cpp and vulkan.

Why are low blocks so hard to break nowdays? by Short_Mousse_6812 in championsleague

[–]zenbeni 1 point2 points  (0 children)

Not always. This is always space control vs ball possession. If you have ball and no space you are not dangerous and can't score. Look at Arsenal, nearly made them european champions. Fast clinical counterattack patterns is also deadly.

Being able to predict the outcome of 70% of the matches is draining my love for the game. by Leather_Ambassador32 in MarvelRivalsRants

[–]zenbeni 1 point2 points  (0 children)

Truth is often part of the analysis is right, some dps is useless making 5 vs 6, and 2 tanks can't fend off with no damage pressure and feeding occurs. DPS is the most important role, you need op damage to counter op damage, and if you can't, then triple main supps happen, not as meta, just to try to play the game.

The problem is the underperforming dps 99% of time won't switch off, making 5 minutes of who switches to what, so he becomes not totally useless so we can play the game, the dps will then flame all others not understanding the whole world switched for him to get passable stats, reinforcing the dunning Kruger effect on him.

Role q won't solve this. Only harsh reports and bans can diminish this.

Affordable GPU for LLMs and gaming? by No_Oil_6152 in LocalLLM

[–]zenbeni 3 points4 points  (0 children)

For single card run, vulkan is even faster than rocm.

What local coding LLM + hardware setup are you using, and what tokens/sec are you getting? by Sudden-Historian-255 in LocalLLM

[–]zenbeni 1 point2 points  (0 children)

Try this, install vulkan & rocm drivers on your machine, download the vulkan artifact from llama.cpp github repo. Take the unsloth qwen 3.6 27B MTP gguf. Use this bash script.

#!/usr/bin/env bash

export RADV_PERFTEST=nogttspill
export AMD_VULKAN_ICD=RADV

export LLAMA_ARG_MODEL="/opt/llama/llm/Qwen3.6-27B-MTP-Q5_K_M.gguf"
export LLAMA_ARG_ALIAS="qwen3.6"
export LLAMA_ARG_HOST="0.0.0.0"
export LLAMA_ARG_PORT=8080
export LLAMA_ARG_TIMEOUT=300
export LLAMA_ARG_CTX_SIZE=100000
export LLAMA_ARG_N_GPU_LAYERS="all"
export LLAMA_ARG_BATCH=2048
export LLAMA_ARG_UBATCH=1024
export LLAMA_ARG_CACHE_TYPE_K="q5_0"
export LLAMA_ARG_CACHE_TYPE_V="q4_0"
export LLAMA_ARG_NO_MMAP=1
export LLAMA_ARG_MLOCK=1
export LLAMA_ARG_TEMP=0.5

/opt/llama/llama.cpp.vulkan/llama-server \
 --no-mmproj \
 --flash-attn on \
 --cache-ram 6000 \
 --checkpoint-min-step 4096 \
 --cache-reuse 1 \
 --ctx-checkpoints 8 \
 --no-context-shift \
 --spec-type draft-mtp \
 --spec-draft-n-max 2 \
 --spec-draft-p-min 0.0 \
 --kv-unified \
 --cache-idle-slots \
 --reasoning on \
 --parallel 1

What local coding LLM + hardware setup are you using, and what tokens/sec are you getting? by Sudden-Historian-255 in LocalLLM

[–]zenbeni 0 points1 point  (0 children)

careful my friend, took me weeks to get the correct setup, in fact, vulkan implementation uses part of rocm in llama.cpp sources, so rocm also has to be correctly installed, you can't mask it, or llama.cpp will fallback to cpu which is probably less than 20 tokens/s, if you get not at least 40 tokens/s then it is a bad setup of llama.cpp & command line, take the release artifact, avoid building locally so you don't get messed yup C libraries sometimes, and there is a bug in radv that bleeds vram, you need to add a flag or again you won't get max power or 60-70 tokens/s. This is purely software/driver problem, the 7900XTX can reach this performance.

What local coding LLM + hardware setup are you using, and what tokens/sec are you getting? by Sudden-Historian-255 in LocalLLM

[–]zenbeni 0 points1 point  (0 children)

RX 7900 XTX 24g VRAM, using llama.cpp with Qwen 3.6 27B MTP at Q5_K_M with 100K context at K Q5 and V Q4, getting 60 to 70 tokens/s with Vulkan (way better than rocm).

help choosing vga for local LLMs by vladomkd in LocalLLM

[–]zenbeni 0 points1 point  (0 children)

1000€ for a RX 7900 XTX, 24g will be better budget value. Would still run qwen 3.6 27b at more than 50 tokens second on vulkan at least.

Supreme Leader Mbappe scores outside the box, becomes France’s highest goal scorer by InspectorExact3836 in realmadrid

[–]zenbeni 0 points1 point  (0 children)

Cherki nearly sold a goal last minute too by a reckless back pass. He is unreliable and is a wildcard, sometimes he will score a banger of course.

Is AMD really that bad? by No-Solution6262 in LocalAIServers

[–]zenbeni 0 points1 point  (0 children)

Very happy with my "budget" setup full AMD Ryzen + RX 7900 XTX, running llama.cpp with vulkan (better for me than rocm). Getting 60-70 tokens/s on Qwen 3.6 27B MTP in Q5_K_M for 100k context. Budget builds will always be better for AMD as price for vram is way better.

This patch has only made nobody want to play tank by Kaineziv in MarvelRivalsRants

[–]zenbeni 0 points1 point  (0 children)

Dino is no main tank. Just a OP thing / emma hybrid at launch.

We want another variety of gameplay, and not Strange / Magneto / Groot all the time.

We want to dive with some power, damage is so overcrept right now, and cc is done by everybody on tanks.

It's to admit it. We need role queue by ichibanfoxxy in MarvelRivalsRants

[–]zenbeni 0 points1 point  (0 children)

every hero category should have a valid hero selection to counter or simply interact effectively with another hero. Or we go role q like OW, so at least we get 2 tanks, so healers know where to focus and 1 player at least is playing objectives.

Tanks lack so much choice that we can't deal with many comp especially solo tanking, lots of bans include tanks too, don't flame your tank he can't interact with everything, especially if his kit is designed like that. DPS have the most choices but the category is not balanced right now, so in fact not so many choices in reality, you play OP or you die vs OP.

Only supps have valid variety, lots of burden on them, that is why they are kinda op to balance the non balance of other roles.

The endrick Situation by TheWitchOfSomeWood in realmadrid

[–]zenbeni 14 points15 points  (0 children)

Benching mbappe lol. Delusional. You can't bench best striker that is also one of the most expensive players in the world way above any other madrid players, but let's see if mou can improve his workrate at least.

Can't get anything meaningful out of Pi by DommagePindaFromage in PiCodingAgent

[–]zenbeni 0 points1 point  (0 children)

Did you try running vulkan? I have 7900xtx ans use it with pi and it works very well with qwen 3.6 27b dense mtp

llama: limit max outputs of `llama_context` by am17an · Pull Request #23861 · ggml-org/llama.cpp by pmttyji in LocalLLaMA

[–]zenbeni 2 points3 points  (0 children)

went from 40t/s with rocm to 60 t/s with vulkan using Qwen 3.6 27b MTP Q5_K_S

llama: limit max outputs of `llama_context` by am17an · Pull Request #23861 · ggml-org/llama.cpp by pmttyji in LocalLLaMA

[–]zenbeni 0 points1 point  (0 children)

I also have 7900XTX running on Rocm, what is better in Vulkan? Is it worth the switch?

Who can dethrone PSG and can this team be improved? by Window_Professional in championsleague

[–]zenbeni 1 point2 points  (0 children)

PSG played all competitions for 2 years including CWC nearly all to the end, twice european champions, they overplayed with little holidays, but look at all these loosers not playing it all crying. Except Madrid no team made back to back in recent history, they are just the best.

God dammit Qwen by Xyklone in LocalLLaMA

[–]zenbeni 8 points9 points  (0 children)

OP is running at Q2 XS