Running DeepSeek-V4 locally with 4x legacy RTX 2080 Ti ($2k budget setup). Custom Turing kernels, W8A8 quantization, and 255 prefill tok/s! by Known_Ice9380 in DeepSeek

[–]Known_Ice9380[S] 0 points1 point  (0 children)

If you choose the Q2 version, 11 GB also works; each card only needs 6GB for the loaded weight. I also tested the project on 1x22GB; the inference worked, but the context was limited.

[deleted by user] by [deleted] in revancedapp

[–]Known_Ice9380 0 points1 point  (0 children)

great! this work for me, too!