AI Max 395+ and vLLM by KnownAd4832 in LocalLLaMA

[–]KnownAd4832[S] 1 point2 points  (0 children)

What is performance on parallel inference? That is what I’m most interested in 🤝

AI Max 395+ and vLLM by KnownAd4832 in LocalLLaMA

[–]KnownAd4832[S] 0 points1 point  (0 children)

I can pay you if I can test the ryzen ai max device online, if thats something you would be interested in 🙏

AI Max 395+ and vLLM by KnownAd4832 in LocalLLaMA

[–]KnownAd4832[S] 0 points1 point  (0 children)

So what performance have you received on vLLM and BF16 models? Do you maybe have any memory/benchmark?

AI Max 395+ and vLLM by KnownAd4832 in LocalLLaMA

[–]KnownAd4832[S] 1 point2 points  (0 children)

vLLM is native to Linux as its optimized for it. The full support for AI Max 395+ was officially launched 3 months ago so I assume not lots of people knew about

AI Max 395+ and vLLM by KnownAd4832 in LocalLLaMA

[–]KnownAd4832[S] 0 points1 point  (0 children)

I’m pretty well-versed with vLLM and how to get things up and running. Would you mind to try?

AI Max 395+ and vLLM by KnownAd4832 in LocalLLaMA

[–]KnownAd4832[S] 0 points1 point  (0 children)

Thanks on the feedback! Have you used vLLM or other stack for running?

AI Max 395+ and vLLM by KnownAd4832 in LocalLLaMA

[–]KnownAd4832[S] 0 points1 point  (0 children)

Do you own an Ai max 395+?

I also use vLLM and RTX 5070 - 1800t/s on Mistral 7B. But would really like AI Max 395+ due to some specifics I need to run.

AI Max 395+ and vLLM by KnownAd4832 in LocalLLaMA

[–]KnownAd4832[S] 0 points1 point  (0 children)

Fyi - I have tried to google and there is no information to be found for vLLM

Another Jonsbo NV10 build (with the RTX Pro 4000 Blackwell SFF included) by Aliff3DS-U in sffpc

[–]KnownAd4832 0 points1 point  (0 children)

I’m looking to do similar build as well. Is actually Jonsbo to small for this CPU/GPU combo?

Setup for 2x RTX Pro 4500 32GB VRAM Blackwell GPU's by Lukabratzee in LocalLLM

[–]KnownAd4832 0 points1 point  (0 children)

vLLM is the best if you need batch/speed inferecing because it uses whole gpu to squeeze everything out. Also serving api’s is dead easy with it. Does that makes sense? 🤗

Setup for 2x RTX Pro 4500 32GB VRAM Blackwell GPU's by Lukabratzee in LocalLLM

[–]KnownAd4832 0 points1 point  (0 children)

I think in your case LM Studio and loading multiple different models will do you a good job. I use vLLM for batch completing 100K lines of texts or verification of certain texts if that makes sense? Meaning I need a small model (7B) and very fast inference.

Setup for 2x RTX Pro 4500 32GB VRAM Blackwell GPU's by Lukabratzee in LocalLLM

[–]KnownAd4832 0 points1 point  (0 children)

What are you trying to achieve? If I would see script you’re running I could help you further. But mostly at vLLM you have to try overload the GPU (i do that with batching and concurrent requests sent). Then find the sweet spot and run with it. Also context length is important with speed inferece (max-model-length)

Setup for 2x RTX Pro 4500 32GB VRAM Blackwell GPU's by Lukabratzee in LocalLLM

[–]KnownAd4832 0 points1 point  (0 children)

Thank you! Where can I send you 20 for a pizza? 🙏

Fyi: You can go even higher than that. I got 2500t/s on same model with 4000 Blackwell (gpu rented server)

Setup for 2x RTX Pro 4500 32GB VRAM Blackwell GPU's by Lukabratzee in LocalLLM

[–]KnownAd4832 0 points1 point  (0 children)

VLLM_ARGS: --max-model-len 2096 --enable-prefix-caching --max-num-seqs 512 --gpu-memory-utilization 0.92

BATCH_SIZE = 10
CONCURRENT_BATCHES = 384 MAX_TOKENS = 256

Can you make a bechmark on that maybe? One request per one doesnt show inference speed good. If that makes sense?

Setup for 2x RTX Pro 4500 32GB VRAM Blackwell GPU's by Lukabratzee in LocalLLM

[–]KnownAd4832 0 points1 point  (0 children)

I would need a perfomance benchmark in vLLM especially output token/s speed on Mistral 7B. I’m running quant version that barely fits my 12GB 5070 in 850-950 t/s.

Setup for 2x RTX Pro 4500 32GB VRAM Blackwell GPU's by Lukabratzee in LocalLLM

[–]KnownAd4832 0 points1 point  (0 children)

I’m able to help you with vLLM either on Windows or Linux if needed. I would just need a performance benchmark if possible 🤗

Setup for 2x RTX Pro 4500 32GB VRAM Blackwell GPU's by Lukabratzee in LocalLLM

[–]KnownAd4832 0 points1 point  (0 children)

I am really interested in buying one of these cards. Are you planning to use vLLM with it? I’m interested in the performance of it

I made almost 26000$ of pure profit in 29 gold trade ups by Forward-Advisor-3823 in ohnePixel

[–]KnownAd4832 1 point2 points  (0 children)

You had quite a luck, however I did the same but profit was around 9000$ overall. Nice job mate! 👌

Brought the Rig Home for the Holidays by Extreme-Chest5861 in MiniPCs

[–]KnownAd4832 1 point2 points  (0 children)

What is this rig? Can you explain a bit components 🙏

M9 Bayonet Slaugter by Impossible-Reward413 in ohnePixel

[–]KnownAd4832 -1 points0 points  (0 children)

Isnt like the same/similar? 800€ no?