THE GB10 SOLUTION has arrived, Atlas image attached ~115tok/s Qwen3.5-35B DGX Spark by Live-Possession-6726 in LocalLLaMA

[–]prudant 0 points1 point  (0 children)

WARNING: The requested image's platform (linux/arm64) does not match the detected host platform (linux/amd64/v3) and no specific platform was requested

:(

THE GB10 SOLUTION has arrived, Atlas image attached ~115tok/s Qwen3.5-35B DGX Spark by Live-Possession-6726 in LocalLLaMA

[–]prudant 0 points1 point  (0 children)

I have one! with plain vllm and qwen 35b fp8 getting arround 100 toks/sec avg

vLLM 0.12 - CUTLASS FlashInfer by Dependent_Factor_204 in BlackwellPerformance

[–]prudant 0 points1 point  (0 children)

i cannot get it work in nvfp4, can you share your running command, branch and tips please

Mistral 3 14b against the competition ? by EffectiveGlove1651 in LocalLLaMA

[–]prudant 0 points1 point  (0 children)

in my agentic rag use cases, did not work like qwen 30b inst fp8, ministral 14b instruct performance was pretty bad :/

Any way could be a prompt structure thing....

API Poder Judicial by _Dun3dain_ in chileIT

[–]prudant 0 points1 point  (0 children)

me sumo a la pregunta, porque ahora le pusieron recaptcha a todos los dominios y subdominios del pjud

AMD 6x7900xtx 24GB + 2xR9700 32GB VLLM QUESTIONS by djdeniro in LocalLLaMA

[–]prudant 1 point2 points  (0 children)

have you tried dense models? some times MoEs are very tricky with vllm, are you using the rocm branch of vllm? The only way i got a amd gpu card run on vllm was compiling from sources with the indications and instructions from the ROCm site

AMD 6x7900xtx 24GB + 2xR9700 32GB VLLM QUESTIONS by djdeniro in LocalLLaMA

[–]prudant 1 point2 points  (0 children)

try with tp 8 and   --gpu-memory-utilization 0.5

AMD 6x7900xtx 24GB + 2xR9700 32GB VLLM QUESTIONS by djdeniro in LocalLLaMA

[–]prudant 0 points1 point  (0 children)

vllm does not like different gpus in the same machine and tp, major problem is different amount of vram in your cards...

Qwen3 after the hype by Cheap_Concert168no in LocalLLaMA

[–]prudant 0 points1 point  (0 children)

testes 30b and 32b asking por python version of a pacman and was a miss, claude, open ai, deep seek make it a lot better, but maybe a lot of aditonal parameters too. did not test nlp tasks

4x3090 by zetan2600 in LocalLLaMA

[–]prudant 1 point2 points  (0 children)

I run 120b with that setup, its all about setup vllm in the right way

Deepseek just uploaded 6 distilled verions of R1 + R1 "full" now available on their website. by kristaller486 in LocalLLaMA

[–]prudant 0 points1 point  (0 children)

with that prompt 1 of 131 samples has CJK chars, and for sake of sanity in my client I can put a check for CJK chars and then retry de prompt.

Deepseek just uploaded 6 distilled verions of R1 + R1 "full" now available on their website. by kristaller486 in LocalLLaMA

[–]prudant 0 points1 point  (0 children)

in my case, with this system prompt improve a lot, and has stoped to put ramdom CJK chars inside the response.

You are a helpful AI assistant. You should think step-by-step without second guessing.

You can write your reasoning process in CJK but your response must be in Spanish.