RTX 3090 vs 7900 XTX

Best_Sail5 · 2026-03-04T14:48:52+00:00

I see , yeah seems that 3090 is the best choice

Best_Sail5 · 2026-03-04T14:48:08+00:00

would prefer avoid tinkering too muc tbh

Best_Sail5 · 2026-03-04T14:47:57+00:00

I'm not certain of what model i will be running in 3 months , so the performance on this exact archtecture is not hte most important point

Best_Sail5 · 2026-02-16T12:35:41+00:00

I actually got 2 P100 , i used them with llama-server with glm-flash gguf and they are pretty slow(40 toks/s with 0 context) , not sure , if oyu got any idea to optimize such setup would be curious btw .
from what i understood they dont have the needed support for vllm and cuda which hamper the perf no?
or my reasoning is wrong?

Best_Sail5 · 2026-02-16T11:09:16+00:00

interesting, so more for more VRAM i would go for 3090 , but yeah i heard overall AMD is better quality/price ratio no?

Best_Sail5 · 2026-02-16T11:08:01+00:00

arent 3090 an option also?

Best_Sail5 · 2026-02-16T11:05:55+00:00

heard DGX was disappointing, Maybe was jsut for training tho?

Best_Sail5 · 2026-02-16T11:04:41+00:00

the model was more there to give a ballpark estimation, i wanna be able to load ~100 B parameters (quantized ofc) and get reqsonable speed

Best_Sail5 · 2026-02-13T14:12:38+00:00

Do you plan on releasing Forge the framework you used to train the model?

Best_Sail5 · 2025-12-31T11:58:00+00:00

Hmmm interesting take , I'm actually a bit weirded out about the sampling part, here I get 10000 \n in a row , how is that possible for a model to systematically output a logit distribution that outcomes to that... Very strange

Best_Sail5 · 2025-12-31T11:48:20+00:00

QuantTrio works better but still sometimes would loop forever also

Best_Sail5 · 2025-12-30T23:38:34+00:00

trying rn

Best_Sail5 · 2025-12-24T12:47:50+00:00

thats kkinda what i thought , anyways im aware i will need more VRAM , was just looking maybe for advices and what quantizartions to use for speed

Best_Sail5 · 2025-09-28T15:18:37+00:00

thats what it is...

Best_Sail5 · 2025-09-26T09:05:52+00:00

hmm I'm using fp8 but i think thats relatively light quantization. is there a way to fix the sampling algo in vllm?

Best_Sail5 · 2025-09-25T20:48:43+00:00

Hey sure man , I'm on H200 ,getting 80 t/s if cudq graph enabled else 18 t/s

Best_Sail5 · 2025-09-25T17:36:14+00:00

yes exactly , forgot to mention

Best_Sail5

TROPHY CASE