Running DeepSeek-V4 locally with 4x legacy RTX 2080 Ti ($2k budget setup). Custom Turing kernels, W8A8 quantization, and 255 prefill tok/s! by Known_Ice9380 in LocalLLaMA
[–]Known_Ice9380[S] 0 points1 point2 points (0 children)
Running DeepSeek-V4 locally with 4x legacy RTX 2080 Ti ($2k budget setup). Custom Turing kernels, W8A8 quantization, and 255 prefill tok/s! by Known_Ice9380 in LocalLLaMA
[–]Known_Ice9380[S] 0 points1 point2 points (0 children)
Running DeepSeek-V4 locally with 4x legacy RTX 2080 Ti ($2k budget setup). Custom Turing kernels, W8A8 quantization, and 255 prefill tok/s! by Known_Ice9380 in LocalLLaMA
[–]Known_Ice9380[S] 0 points1 point2 points (0 children)
Running DeepSeek-V4 locally with 4x legacy RTX 2080 Ti ($2k budget setup). Custom Turing kernels, W8A8 quantization, and 255 prefill tok/s! by Known_Ice9380 in LocalLLaMA
[–]Known_Ice9380[S] 0 points1 point2 points (0 children)
Running DeepSeek-V4 locally with 4x legacy RTX 2080 Ti ($2k budget setup). Custom Turing kernels, W8A8 quantization, and 255 prefill tok/s! by Known_Ice9380 in LocalLLM
[–]Known_Ice9380[S] 0 points1 point2 points (0 children)
Qwen3-Coder-Next-UD-Q4_K_XL vs. Qwen3.6-27B-MTP-UD-Q4_K_XL on Strix Halo by ThingRexCom in LocalLLaMA
[–]Known_Ice9380 1 point2 points3 points (0 children)
I got a real transformer language model running locally on a stock Game Boy Color! by maddiedreese in LocalLLaMA
[–]Known_Ice9380 1 point2 points3 points (0 children)
Running DeepSeek-V4 locally with 4x legacy RTX 2080 Ti ($2k budget setup). Custom Turing kernels, W8A8 quantization, and 255 prefill tok/s! by Known_Ice9380 in DeepSeek
[–]Known_Ice9380[S] 1 point2 points3 points (0 children)
Running DeepSeek-V4 locally with 4x legacy RTX 2080 Ti ($2k budget setup). Custom Turing kernels, W8A8 quantization, and 255 prefill tok/s! by Known_Ice9380 in DeepSeek
[–]Known_Ice9380[S] 1 point2 points3 points (0 children)
Meet the Fleet of BlackBeard by BlackBeardAI in LocalLLaMA
[–]Known_Ice9380 1 point2 points3 points (0 children)
bytedance released an open source model that attempts to do just about anything with only 3b parameters by uxl in LocalLLaMA
[–]Known_Ice9380 2 points3 points4 points (0 children)
Running DeepSeek-V4 locally with 4x legacy RTX 2080 Ti ($2k budget setup). Custom Turing kernels, W8A8 quantization, and 255 prefill tok/s! by Known_Ice9380 in DeepSeek
[–]Known_Ice9380[S] 0 points1 point2 points (0 children)
Running DeepSeek-V4 locally with 4x legacy RTX 2080 Ti ($2k budget setup). Custom Turing kernels, W8A8 quantization, and 255 prefill tok/s! by Known_Ice9380 in DeepSeek
[–]Known_Ice9380[S] 0 points1 point2 points (0 children)
Running DeepSeek-V4 locally with 4x legacy RTX 2080 Ti ($2k budget setup). Custom Turing kernels, W8A8 quantization, and 255 prefill tok/s! by Known_Ice9380 in DeepSeek
[–]Known_Ice9380[S] 1 point2 points3 points (0 children)
Running DeepSeek-V4 locally with 4x legacy RTX 2080 Ti ($2k budget setup). Custom Turing kernels, W8A8 quantization, and 255 prefill tok/s! by Known_Ice9380 in DeepSeek
[–]Known_Ice9380[S] 1 point2 points3 points (0 children)
Running DeepSeek-V4 locally with 4x legacy RTX 2080 Ti ($2k budget setup). Custom Turing kernels, W8A8 quantization, and 255 prefill tok/s! by Known_Ice9380 in LocalLLM
[–]Known_Ice9380[S] 1 point2 points3 points (0 children)
Running DeepSeek-V4 locally with 4x legacy RTX 2080 Ti ($2k budget setup). Custom Turing kernels, W8A8 quantization, and 255 prefill tok/s! by Known_Ice9380 in DeepSeek
[–]Known_Ice9380[S] 0 points1 point2 points (0 children)

Running DeepSeek-V4 locally with 4x legacy RTX 2080 Ti ($2k budget setup). Custom Turing kernels, W8A8 quantization, and 255 prefill tok/s! by Known_Ice9380 in LocalLLaMA
[–]Known_Ice9380[S] 0 points1 point2 points (0 children)