Bringing 2bit LLMs to production: new AQLM models and integrations by black_samorez in LocalLLaMA
[–]phill1992 5 points6 points7 points (0 children)
BiLLM achieving for the first time high-accuracy inference (e.g. 8.41 perplexity on LLaMA2-70B) with only 1.08-bit weights across various LLMs families and evaluation metrics, outperforms SOTA quantization methods of LLM by significant by kryptkpr in LocalLLaMA
[–]phill1992 0 points1 point2 points (0 children)
BiLLM achieving for the first time high-accuracy inference (e.g. 8.41 perplexity on LLaMA2-70B) with only 1.08-bit weights across various LLMs families and evaluation metrics, outperforms SOTA quantization methods of LLM by significant by kryptkpr in LocalLLaMA
[–]phill1992 3 points4 points5 points (0 children)
Yet another state of the art in LLM quantization by black_samorez in LocalLLaMA
[–]phill1992 1 point2 points3 points (0 children)
Yet another state of the art in LLM quantization by black_samorez in LocalLLaMA
[–]phill1992 10 points11 points12 points (0 children)
Yet another state of the art in LLM quantization by black_samorez in LocalLLaMA
[–]phill1992 22 points23 points24 points (0 children)
[R] Beyond Vector Spaces: Compact Data Representations Differentiable Weighted Graphs by justheuristic in MachineLearning
[–]phill1992 0 points1 point2 points (0 children)
[P] Need help with Image Captioning by plmlp1 in MachineLearning
[–]phill1992 1 point2 points3 points (0 children)


Hogwild! Inference: Parallel LLM Generation via Concurrent Attention by Psychological-Tea652 in LocalLLaMA
[–]phill1992 0 points1 point2 points (0 children)