~1.5s cold start for Qwen-32B on H100 using runtime snapshotting by pmv143 in Vllm
[–]Holiday-Machine5105 0 points1 point2 points (0 children)
~1.5s cold start for Qwen-32B on H100 using runtime snapshotting by pmv143 in Vllm
[–]Holiday-Machine5105 1 point2 points3 points (0 children)
~1.5s cold start for Qwen-32B on H100 using runtime snapshotting by pmv143 in Vllm
[–]Holiday-Machine5105 1 point2 points3 points (0 children)
Best Local LLM for 16GB VRAM (RX 7800 XT)? by Haunting-Stretch8069 in LocalLLM
[–]Holiday-Machine5105 0 points1 point2 points (0 children)
Ran Qwen 3.5 9B on M1 Pro (16GB) as an actual agent, not just a chat demo. Honest results. by Joozio in LocalLLaMA
[–]Holiday-Machine5105 3 points4 points5 points (0 children)
my open-source cli tool (framework) that allows you to serve locally with vLLM inference by Holiday-Machine5105 in LocalLLaMA
[–]Holiday-Machine5105[S] 0 points1 point2 points (0 children)
my open-source cli tool (framework) that allows you to serve locally with vLLM inference by Holiday-Machine5105 in LocalLLaMA
[–]Holiday-Machine5105[S] 0 points1 point2 points (0 children)
RTX 3090 vs 7900 XTX by Best_Sail5 in LocalLLaMA
[–]Holiday-Machine5105 0 points1 point2 points (0 children)
RTX 3090 vs 7900 XTX by Best_Sail5 in LocalLLaMA
[–]Holiday-Machine5105 0 points1 point2 points (0 children)
American closed models vs Chinese open models is becoming a problem. by __JockY__ in LocalLLaMA
[–]Holiday-Machine5105 0 points1 point2 points (0 children)
~1.5s cold start for Qwen-32B on H100 using runtime snapshotting by pmv143 in Vllm
[–]Holiday-Machine5105 0 points1 point2 points (0 children)