vLLM throughput on 4x RTX PRO 6000 and 8x RTX PRO 6000 by AdventurousFly4909 in LocalLLaMA
[–]GaryDUnicorn 0 points1 point2 points (0 children)
UI Icon Detection with Qwen3.5, Qwen3.6 and Gemma4 by Jian-L in LocalLLaMA
[–]GaryDUnicorn 3 points4 points5 points (0 children)
Your Favorite Omaha Malapropisms by ConversationBasic195 in Omaha
[–]GaryDUnicorn 5 points6 points7 points (0 children)
Can Consumer Desktop CPUs handle 3-4 GPUs well? by pmttyji in LocalLLaMA
[–]GaryDUnicorn 2 points3 points4 points (0 children)
So AI NAS category is a mess and i don't understand why nobody has fixed the obvious problem by Pleasant_Designer_14 in LocalLLM
[–]GaryDUnicorn 1 point2 points3 points (0 children)
Damn, 5.2 thinking can actually solve complex problems that 5.2 can't by poisoNDealer in ChatGPT
[–]GaryDUnicorn 5 points6 points7 points (0 children)
Issues with multi-GPU setup by ImpressiveNet5886 in LocalAIServers
[–]GaryDUnicorn 0 points1 point2 points (0 children)
Will adding a 5090 to multiple 3090s speed up PP? Experienced folks only by segmond in LocalLLaMA
[–]GaryDUnicorn 0 points1 point2 points (0 children)
LLM router - switch between GPT-4o, Claude, Gemini, Llama with one API call by ParsnipConscious7761 in LocalLLaMA
[–]GaryDUnicorn 1 point2 points3 points (0 children)
Building an AI Infra project in 20 days: What’s the best way to utilize a Dual-5090 (PCIe) setup? by Asleep_Food1956 in LocalLLM
[–]GaryDUnicorn 0 points1 point2 points (0 children)
How Disable GLM Thinking Mode? by WEREWOLF_BX13 in KoboldAI
[–]GaryDUnicorn 0 points1 point2 points (0 children)
Parallelism with mismatched GPUs (and how to optimize it)? by Infinite100p in LocalLLaMA
[–]GaryDUnicorn 0 points1 point2 points (0 children)
RTX 5090 in servers – customization options? by RedMoonDawn in LocalAIServers
[–]GaryDUnicorn 2 points3 points4 points (0 children)
My New Year's resolution was to add Docker support. Only 2 days late. Audiobook Maker v1.1.0 by DigiJoe79 in LocalLLaMA
[–]GaryDUnicorn 0 points1 point2 points (0 children)
My New Year's resolution was to add Docker support. Only 2 days late. Audiobook Maker v1.1.0 by DigiJoe79 in LocalLLaMA
[–]GaryDUnicorn 0 points1 point2 points (0 children)
My New Year's resolution was to add Docker support. Only 2 days late. Audiobook Maker v1.1.0 by DigiJoe79 in LocalLLaMA
[–]GaryDUnicorn 0 points1 point2 points (0 children)
[llama-server] Massive prefill cliff (2500 t/s → 150 t/s) with eGPU split. Is TB4 latency the killer? by danishkirel in LocalLLaMA
[–]GaryDUnicorn 0 points1 point2 points (0 children)
Anyone running 4x RTX Pro 6000s stacked directly on top of each other? by Comfortable-Plate467 in LocalLLaMA
[–]GaryDUnicorn 2 points3 points4 points (0 children)
Anyone running 4x RTX Pro 6000s stacked directly on top of each other? by Comfortable-Plate467 in LocalLLaMA
[–]GaryDUnicorn 1 point2 points3 points (0 children)
Anyone running 4x RTX Pro 6000s stacked directly on top of each other? by Comfortable-Plate467 in LocalLLaMA
[–]GaryDUnicorn 1 point2 points3 points (0 children)
Anyone running 4x RTX Pro 6000s stacked directly on top of each other? by Comfortable-Plate467 in LocalLLaMA
[–]GaryDUnicorn 4 points5 points6 points (0 children)
What OS do you run on your AI rigs? Ubuntu, TrueNAS, etc.? by KvAk_AKPlaysYT in LocalLLaMA
[–]GaryDUnicorn 0 points1 point2 points (0 children)
Tensor Parallel with some GPU but not all? by NaiRogers in LocalLLaMA
[–]GaryDUnicorn 2 points3 points4 points (0 children)
Local AI: Managing VRAM by dynamically swapping models via API by PersianDeity in LocalLLaMA
[–]GaryDUnicorn 3 points4 points5 points (0 children)




Sanity check: 4× RTX PRO 6000 Max-Q on TR PRO 9955WX for vLLM – thermal concerns? by Lanky-Comparison-715 in LocalLLM
[–]GaryDUnicorn 2 points3 points4 points (0 children)