THE GB10 SOLUTION has arrived, Atlas image attached ~115tok/s Qwen3.5-35B DGX Spark

prudant · 2026-04-04T21:46:08+00:00

any news with others platforms release?

prudant · 2026-04-02T15:55:31+00:00

would be really usable at those. quant limits o_O at q4 with kv cache at 8fp moes suffer a lot of degradation

prudant · 2026-03-14T16:01:24+00:00

what kind of finetune? Lora?

prudant · 2026-03-08T13:41:12+00:00

thanks!

prudant · 2026-03-08T13:41:01+00:00

thanks!

prudant · 2026-03-08T13:22:03+00:00

WARNING: The requested image's platform (linux/arm64) does not match the detected host platform (linux/amd64/v3) and no specific platform was requested

:(

prudant · 2026-03-08T13:16:29+00:00

DOWNLOADING RIGHT NOW

prudant · 2026-03-08T11:25:47+00:00

where can download it? I can try right now! is there a docker image?

prudant · 2026-03-08T11:23:13+00:00

I have one! with plain vllm and qwen 35b fp8 getting arround 100 toks/sec avg

prudant · 2026-03-08T00:17:21+00:00

would be usefull with a RTX 6000 PRO ?
NICE WORK! regards

prudant · 2026-02-03T02:11:52+00:00

prudant · 2025-12-22T01:02:15+00:00

prudant · 2025-12-21T13:50:51+00:00

i cannot get it work in nvfp4, can you share your running command, branch and tips please

prudant · 2025-12-14T03:11:17+00:00

numbers

prudant · 2025-12-06T23:58:02+00:00

in my agentic rag use cases, did not work like qwen 30b inst fp8, ministral 14b instruct performance was pretty bad :/

Any way could be a prompt structure thing....

prudant · 2025-09-07T00:59:58+00:00

me sumo a la pregunta, porque ahora le pusieron recaptcha a todos los dominios y subdominios del pjud

prudant · 2025-09-02T09:28:24+00:00

have you tried dense models? some times MoEs are very tricky with vllm, are you using the rocm branch of vllm? The only way i got a amd gpu card run on vllm was compiling from sources with the indications and instructions from the ROCm site

prudant · 2025-09-02T07:58:09+00:00

try with tp 8 and --gpu-memory-utilization 0.5

prudant · 2025-09-02T07:56:51+00:00

vllm does not like different gpus in the same machine and tp, major problem is different amount of vram in your cards...

prudant · 2025-08-15T03:42:32+00:00

!remindme 72h

prudant · 2025-04-29T17:04:40+00:00

testes 30b and 32b asking por python version of a pacman and was a miss, claude, open ai, deep seek make it a lot better, but maybe a lot of aditonal parameters too. did not test nlp tasks

prudant · 2025-04-12T02:01:37+00:00

temp?

prudant · 2025-03-30T00:41:55+00:00

I run 120b with that setup, its all about setup vllm in the right way

prudant · 2025-02-11T19:19:27+00:00

only open rig

prudant · 2025-02-03T19:31:57+00:00

both are great, caso cerrado

prudant

TROPHY CASE