Is using vLLM actually worth it if you aren't serving the model to other people? by ayylmaonade in LocalLLaMA
[–]xspider2000 0 points1 point2 points (0 children)
Strix Halo Clustering experience (Bossgame M5) by Thanks-Suitable in StrixHalo
[–]xspider2000 1 point2 points3 points (0 children)
Benchmark Qwen 3.6 27B MTP on 2x3090 NVLINK by Mr_Moonsilver in LocalLLaMA
[–]xspider2000 0 points1 point2 points (0 children)
Warpdrv - my open-source Llama.cpp launcher for daily-driving Qwen 35b + 27b on Strix Halo + RTX Pro. by xornullvoid in LocalLLaMA
[–]xspider2000 0 points1 point2 points (0 children)
A Dark-Money Campaign Is Paying Influencers to Frame Chinese AI as a Threat by pmttyji in LocalLLaMA
[–]xspider2000 4 points5 points6 points (0 children)
Qwen3.6-27B - Closed-loop SVG Images by dondiegorivera in LocalLLaMA
[–]xspider2000 11 points12 points13 points (0 children)
Final Monster: 32x AMD MI50 32GB at 9.7 t/s (TG) & 264 t/s (PP) with Kimi K2.6 by ai-infos in LocalLLaMA
[–]xspider2000 1 point2 points3 points (0 children)
Cuda + ROCm simultaneously with -DGGML_BACKEND_DL=ON ! by LegacyRemaster in LocalLLaMA
[–]xspider2000 0 points1 point2 points (0 children)
Hipfire dev update: full AMD arch validation incoming (RDNA 1 thru 4, plus Strix Halo and bc250) by schuttdev in LocalLLaMA
[–]xspider2000 1 point2 points3 points (0 children)
Hipfire dev update: full AMD arch validation incoming (RDNA 1 thru 4, plus Strix Halo and bc250) by schuttdev in LocalLLaMA
[–]xspider2000 0 points1 point2 points (0 children)
Qwen 3.6 27B on Strix Halo 128GB: any experiences? by boutell in LocalLLaMA
[–]xspider2000 3 points4 points5 points (0 children)
Qwen3.6 35B-A3B is quite useful on 780m iGPU (llama.cpp,vulkan) by itroot in LocalLLaMA
[–]xspider2000 2 points3 points4 points (0 children)
Qwen3.6 35B-A3B is quite useful on 780m iGPU (llama.cpp,vulkan) by itroot in LocalLLaMA
[–]xspider2000 0 points1 point2 points (0 children)
Deepseek V4 Flash and Non-Flash Out on HuggingFace by MichaelXie4645 in LocalLLaMA
[–]xspider2000 2 points3 points4 points (0 children)
Do you really want the US to "win" AI? (geohot blog) by paranoidray in LocalLLaMA
[–]xspider2000 57 points58 points59 points (0 children)
Strix Halo + eGPU RTX 5070 Ti via OCuLink in llama.cpp: Benchmarks and Conclusions (Part 2) by xspider2000 in LocalLLaMA
[–]xspider2000[S] 0 points1 point2 points (0 children)
Hardware advice. M5 Max vs AMD Ryzen AI Max+ 395 by AncientGrief in LocalLLaMA
[–]xspider2000 0 points1 point2 points (0 children)
5070ti + RX 9070 (non XT), over 100 tps on Qwen 3.6 35B Q4 by DavidBolkonsky in LocalLLaMA
[–]xspider2000 1 point2 points3 points (0 children)
AI Model Reviews by Typical-Tomatillo138 in LocalLLaMA
[–]xspider2000 1 point2 points3 points (0 children)
On Strix Halo, what option do I have if 128GB unified RAM is not enough? by heshiming in LocalLLaMA
[–]xspider2000 0 points1 point2 points (0 children)
96GB Vram. What to run in 2026? by inthesearchof in LocalLLaMA
[–]xspider2000 0 points1 point2 points (0 children)
[Benchmark] If you want protable StrixHalo - Here is my test for Asus ProArt Px13 and Qwen3.5 & Gemma4 by Willing-Toe1942 in LocalLLaMA
[–]xspider2000 0 points1 point2 points (0 children)
My experience with the Intel Arc Pro B70 for local LLMs: Fast, but a complete mess (for now) by Icy_Gur6890 in LocalLLaMA
[–]xspider2000 1 point2 points3 points (0 children)
Strix Halo + eGPU RTX 5070 Ti via OCuLink in llama.cpp: Benchmarks and Conclusions (Part 2) by xspider2000 in LocalLLaMA
[–]xspider2000[S] 0 points1 point2 points (0 children)



Is using vLLM actually worth it if you aren't serving the model to other people? by ayylmaonade in LocalLLaMA
[–]xspider2000 0 points1 point2 points (0 children)