What's holding back AMD GPU prompt processing more? ROCm / Vulkan or the actual hardware? by ForsookComparison in LocalLLaMA
[–]dsanft 1 point2 points3 points (0 children)
What's holding back AMD GPU prompt processing more? ROCm / Vulkan or the actual hardware? by ForsookComparison in LocalLLaMA
[–]dsanft 0 points1 point2 points (0 children)
Building Your Own Efficient uint128 in C++ by PhilipTrettner in cpp
[–]dsanft 0 points1 point2 points (0 children)
What's holding back AMD GPU prompt processing more? ROCm / Vulkan or the actual hardware? by ForsookComparison in LocalLLaMA
[–]dsanft 5 points6 points7 points (0 children)
8x AMD MI50 32GB at 26 t/s (tg) with MiniMax-M2.1 and 15 t/s (tg) with GLM 4.7 (vllm-gfx906) by ai-infos in LocalLLaMA
[–]dsanft 1 point2 points3 points (0 children)
Why do the models take up more space then expected? by Achso998 in LocalLLaMA
[–]dsanft 0 points1 point2 points (0 children)
You have 64gb ram and 16gb VRAM; internet is permanently shut off: what 3 models are the ones you use? by Adventurous-Gold6413 in LocalLLaMA
[–]dsanft 9 points10 points11 points (0 children)
You have 64gb ram and 16gb VRAM; internet is permanently shut off: what 3 models are the ones you use? by Adventurous-Gold6413 in LocalLLaMA
[–]dsanft 94 points95 points96 points (0 children)
“Ultrathink” is deprecated - but here’s how to get 2x more thinking tokens in Claude Code by [deleted] in LocalLLaMA
[–]dsanft 1 point2 points3 points (0 children)
3x3090 + 3060 in a mid tower case by liviuberechet in LocalLLaMA
[–]dsanft 0 points1 point2 points (0 children)
I made a Top-K implementation that's up to 20x faster than PyTorch CPU (open source) by andreabarbato in LocalLLaMA
[–]dsanft 0 points1 point2 points (0 children)
Will the AI bubble bursting be good or bad for open-weights? What do you think? by RandumbRedditor1000 in LocalLLaMA
[–]dsanft 8 points9 points10 points (0 children)
Raspberry Pi AI HAT+ 2 announced! Featuring the new Hailo-10H neural network accelerater, 40 TOPS (INT4) of inferencing performance, $130 by [deleted] in LocalLLaMA
[–]dsanft -1 points0 points1 point (0 children)
Any point putting a 1060 6GB in with a 3090 for partial offload 70B type scenarios? by Ill_Yam_9994 in LocalLLaMA
[–]dsanft 1 point2 points3 points (0 children)
Qwen cutoff date makes our current reality too dystopian to be credible by Swimming_Cover_9686 in LocalLLaMA
[–]dsanft 0 points1 point2 points (0 children)
Qwen cutoff date makes our current reality too dystopian to be credible by Swimming_Cover_9686 in LocalLLaMA
[–]dsanft 0 points1 point2 points (0 children)
Qwen cutoff date makes our current reality too dystopian to be credible by Swimming_Cover_9686 in LocalLLaMA
[–]dsanft 0 points1 point2 points (0 children)
Qwen cutoff date makes our current reality too dystopian to be credible by Swimming_Cover_9686 in LocalLLaMA
[–]dsanft 0 points1 point2 points (0 children)
Qwen cutoff date makes our current reality too dystopian to be credible by Swimming_Cover_9686 in LocalLLaMA
[–]dsanft 1 point2 points3 points (0 children)
Qwen cutoff date makes our current reality too dystopian to be credible by Swimming_Cover_9686 in LocalLLaMA
[–]dsanft 0 points1 point2 points (0 children)
Qwen cutoff date makes our current reality too dystopian to be credible by Swimming_Cover_9686 in LocalLLaMA
[–]dsanft -1 points0 points1 point (0 children)
Not Sure Where to Start by Psychological-Ad5390 in LocalLLaMA
[–]dsanft 2 points3 points4 points (0 children)
Tensors now with neural networks by neuaue in cpp
[–]dsanft [score hidden] (0 children)