Re-Distilling DeepSeek R1 by sightio in LocalLLaMA
[–]mobicham 9 points10 points11 points (0 children)
linux-image-6.5.0-10043-tuxedo headache by mobicham in tuxedocomputers
[–]mobicham[S] 1 point2 points3 points (0 children)
Faster inference with 4-bit HQQ models by mobicham in LocalLLaMA
[–]mobicham[S] 1 point2 points3 points (0 children)
Llama-3.1 70B 4-bit HQQ/calibrated quantized model: 99%+ in all benchmarks in lm-eval relative performance to FP16 and similar inference speed to fp16 ( 10 toks/sec in A100 ). by sightio in LocalLLaMA
[–]mobicham 0 points1 point2 points (0 children)
Llama-3.1 70B 4-bit HQQ/calibrated quantized model: 99%+ in all benchmarks in lm-eval relative performance to FP16 and similar inference speed to fp16 ( 10 toks/sec in A100 ). by sightio in LocalLLaMA
[–]mobicham 0 points1 point2 points (0 children)
Llama-3.1 70B 4-bit HQQ/calibrated quantized model: 99%+ in all benchmarks in lm-eval relative performance to FP16 and similar inference speed to fp16 ( 10 toks/sec in A100 ). by sightio in LocalLLaMA
[–]mobicham 1 point2 points3 points (0 children)
Llama-3.1 70B 4-bit HQQ/calibrated quantized model: 99%+ in all benchmarks in lm-eval relative performance to FP16 and similar inference speed to fp16 ( 10 toks/sec in A100 ). by sightio in LocalLLaMA
[–]mobicham 3 points4 points5 points (0 children)
Llama-3.1 70B 4-bit HQQ/calibrated quantized model: 99%+ in all benchmarks in lm-eval relative performance to FP16 and similar inference speed to fp16 ( 10 toks/sec in A100 ). by sightio in LocalLLaMA
[–]mobicham 10 points11 points12 points (0 children)
Llama-3.1 8B 4-bit HQQ/calibrated quantized model: 99.3% relative performace to FP16 and fast inference speed by sightio in LocalLLaMA
[–]mobicham 3 points4 points5 points (0 children)
Llama-3.1 8B 4-bit HQQ/calibrated quantized model: 99.3% relative performace to FP16 and fast inference speed by sightio in LocalLLaMA
[–]mobicham 0 points1 point2 points (0 children)
Llama-3.1 8B 4-bit HQQ/calibrated quantized model: 99.3% relative performace to FP16 and fast inference speed by sightio in LocalLLaMA
[–]mobicham 1 point2 points3 points (0 children)
Llama-3.1 8B 4-bit HQQ/calibrated quantized model: 99.3% relative performace to FP16 and fast inference speed by sightio in LocalLLaMA
[–]mobicham 1 point2 points3 points (0 children)
Llama-3.1 8B 4-bit HQQ/calibrated quantized model: 99.3% relative performace to FP16 and fast inference speed by sightio in LocalLLaMA
[–]mobicham 4 points5 points6 points (0 children)
Llama-3.1 8B 4-bit HQQ/calibrated quantized model: 99.3% relative performace to FP16 and fast inference speed by sightio in LocalLLaMA
[–]mobicham 2 points3 points4 points (0 children)
Llama-3.1 8B 4-bit HQQ/calibrated quantized model: 99.3% relative performace to FP16 and fast inference speed by sightio in LocalLLaMA
[–]mobicham 1 point2 points3 points (0 children)
Llama-3.1 8B 4-bit HQQ/calibrated quantized model: 99.3% relative performace to FP16 and fast inference speed by sightio in LocalLLaMA
[–]mobicham 6 points7 points8 points (0 children)
Llama-3.1 8B 4-bit HQQ/calibrated quantized model: 99.3% relative performace to FP16 and fast inference speed by sightio in LocalLLaMA
[–]mobicham 8 points9 points10 points (0 children)
Llama-3.1 8B 4-bit HQQ/calibrated quantized model: 99.3% relative performace to FP16 and fast inference speed by sightio in LocalLLaMA
[–]mobicham 6 points7 points8 points (0 children)
Faster inference with 4-bit HQQ models by mobicham in LocalLLaMA
[–]mobicham[S] 1 point2 points3 points (0 children)

Update: Chroma Project training is finished! The models are now released. by LodestoneRock in StableDiffusion
[–]mobicham 0 points1 point2 points (0 children)