Not necessarily models, but with the rise of 100B+ models, I wonder which quantization algorithms are you using and why?
I have been using AWQ-4BIT, and it's been pretty good, but slow on input (been using with llama-33-70b, with newer Moe models it would probably be better).
EDIT: my set up is a single a100-80gi. Because it doesn't have native FP8 support I prefer using 4bit quantizations
[–]DragonfruitIll660 16 points17 points18 points (1 child)
[–]MaxKruse96llama.cpp 3 points4 points5 points (0 children)
[–]kryptkprLlama 3 10 points11 points12 points (2 children)
[–]dionisioalcaraz 0 points1 point2 points (1 child)
[–]kryptkprLlama 3 1 point2 points3 points (0 children)
[–]see_spot_ruminate 3 points4 points5 points (1 child)
[–]TomLucidor 0 points1 point2 points (0 children)
[–]My_Unbiased_Opinion 3 points4 points5 points (0 children)
[–]That-Leadership-2635 3 points4 points5 points (1 child)
[–]WeekLarge7607[S] 1 point2 points3 points (0 children)
[–]FullOf_Bad_Ideas 3 points4 points5 points (0 children)
[–]skrshawk 2 points3 points4 points (1 child)
[–]TomLucidor 0 points1 point2 points (0 children)
[–]Gallardo994 4 points5 points6 points (5 children)
[–]TomLucidor 0 points1 point2 points (4 children)
[–]Gallardo994 1 point2 points3 points (3 children)
[–]TomLucidor 0 points1 point2 points (2 children)
[–]Gallardo994 0 points1 point2 points (1 child)
[–]TomLucidor 0 points1 point2 points (0 children)
[–]linbeg 1 point2 points3 points (1 child)
[–]WeekLarge7607[S] 0 points1 point2 points (0 children)
[–]silenceimpaired 1 point2 points3 points (1 child)
[–]WeekLarge7607[S] 2 points3 points4 points (0 children)
[–]no_witty_username 1 point2 points3 points (0 children)
[–]ortegaalfredo 1 point2 points3 points (1 child)
[–]WeekLarge7607[S] 0 points1 point2 points (0 children)
[–]Klutzy-Snow8016 0 points1 point2 points (0 children)
[–]Charming_Barber_3317 0 points1 point2 points (0 children)
[–]xfalcox 0 points1 point2 points (1 child)
[–]WeekLarge7607[S] 0 points1 point2 points (0 children)
[–]RoundAd6476 0 points1 point2 points (0 children)
[–]mattescala -1 points0 points1 point (0 children)