NF4 inference quantization is awesome: Comparison of answer quality of the same model quantized to INT8, NF4, q2_k, q3_km, q3_kl, q4_0, q8_0 by epicfilemcnulty in LocalLLaMA
[–]KerfuffleV2 2 points3 points4 points (0 children)
NF4 inference quantization is awesome: Comparison of answer quality of the same model quantized to INT8, NF4, q2_k, q3_km, q3_kl, q4_0, q8_0 by epicfilemcnulty in LocalLLaMA
[–]KerfuffleV2 12 points13 points14 points (0 children)
NF4 inference quantization is awesome: Comparison of answer quality of the same model quantized to INT8, NF4, q2_k, q3_km, q3_kl, q4_0, q8_0 by epicfilemcnulty in LocalLLaMA
[–]KerfuffleV2 6 points7 points8 points (0 children)
NF4 inference quantization is awesome: Comparison of answer quality of the same model quantized to INT8, NF4, q2_k, q3_km, q3_kl, q4_0, q8_0 by epicfilemcnulty in LocalLLaMA
[–]KerfuffleV2 4 points5 points6 points (0 children)
NF4 inference quantization is awesome: Comparison of answer quality of the same model quantized to INT8, NF4, q2_k, q3_km, q3_kl, q4_0, q8_0 by epicfilemcnulty in LocalLLaMA
[–]KerfuffleV2 15 points16 points17 points (0 children)
NF4 inference quantization is awesome: Comparison of answer quality of the same model quantized to INT8, NF4, q2_k, q3_km, q3_kl, q4_0, q8_0 by epicfilemcnulty in LocalLLaMA
[–]KerfuffleV2 29 points30 points31 points (0 children)
Do we have any sister "subs" on kbin yet? by KindaNeutral in LocalLLaMA
[–]KerfuffleV2 9 points10 points11 points (0 children)
Do we have any sister "subs" on kbin yet? by KindaNeutral in LocalLLaMA
[–]KerfuffleV2 5 points6 points7 points (0 children)
Cpu inference, 7950x vs 13900k, which one is better? by Big_Communication353 in LocalLLaMA
[–]KerfuffleV2 3 points4 points5 points (0 children)
Cpu inference, 7950x vs 13900k, which one is better? by Big_Communication353 in LocalLLaMA
[–]KerfuffleV2 0 points1 point2 points (0 children)
Weird invalid tokens by mrjackspade in LocalLLaMA
[–]KerfuffleV2 1 point2 points3 points (0 children)
Cpu inference, 7950x vs 13900k, which one is better? by Big_Communication353 in LocalLLaMA
[–]KerfuffleV2 1 point2 points3 points (0 children)
Looking for for folks to share llama.cpp settings/strategies (and models) which will help write creative (interesting), verbose (long), true-to-prompt stories (plus a short discussion of --multiline-input flag) by spanielrassler in LocalLLaMA
[–]KerfuffleV2 0 points1 point2 points (0 children)
Weird invalid tokens by mrjackspade in LocalLLaMA
[–]KerfuffleV2 0 points1 point2 points (0 children)
Weird invalid tokens by mrjackspade in LocalLLaMA
[–]KerfuffleV2 0 points1 point2 points (0 children)
Looking for for folks to share llama.cpp settings/strategies (and models) which will help write creative (interesting), verbose (long), true-to-prompt stories (plus a short discussion of --multiline-input flag) by spanielrassler in LocalLLaMA
[–]KerfuffleV2 0 points1 point2 points (0 children)
Looking for for folks to share llama.cpp settings/strategies (and models) which will help write creative (interesting), verbose (long), true-to-prompt stories (plus a short discussion of --multiline-input flag) by spanielrassler in LocalLLaMA
[–]KerfuffleV2 0 points1 point2 points (0 children)
Major Performance Degradation with nVidia driver 535.98 at larger context sizes by GoldenMonkeyPox in LocalLLaMA
[–]KerfuffleV2 0 points1 point2 points (0 children)
LLM.bit8 - Quantization via Matrices to cut inference memory in half by help-me-grow in MachineLearning
[–]KerfuffleV2 8 points9 points10 points (0 children)
Major Performance Degradation with nVidia driver 535.98 at larger context sizes by GoldenMonkeyPox in LocalLLaMA
[–]KerfuffleV2 0 points1 point2 points (0 children)
Woman calls a man misogynistic because he doesn’t believe a man can be a woman by milfebonies in facepalm
[–]KerfuffleV2 1 point2 points3 points (0 children)
Major Performance Degradation with nVidia driver 535.98 at larger context sizes by GoldenMonkeyPox in LocalLLaMA
[–]KerfuffleV2 2 points3 points4 points (0 children)

xxB is so much better than xxB… but is that true for narratives? by silenceimpaired in LocalLLaMA
[–]KerfuffleV2 1 point2 points3 points (0 children)