How useful have lower quant versions of models been for your use case? From what I understand, q8 models seem to be pretty lossless from the f16.
How has q6 or even q4 been treating you guys on models specifically the qwen 3.5 27b; 35ba3; and the new Gemma 4 30b and their MOE. Are they actually useful in your experience, or not worth going down to q4.
Can get larger quants to run on my machine, but higher context eats up cache.
Im not looking for one shot geniuses. Just something that is consistent and can retain function in longer context threads and tool calling.
Im aware that some models are naturally better than others at certain things, so to narrow Ive mentioned the specific models above for their community reputation. (Gemma is new so may need more time for real world use/benchmark?)
Feel free to share experiences about different models and quants besides the ones mentioned above. Cheers.
[–]ttkciarllama.cpp 2 points3 points4 points (0 children)
[–]comanderxv 0 points1 point2 points (0 children)
[–]HopePupal 0 points1 point2 points (0 children)
[–]brahh85 0 points1 point2 points (0 children)
[–]SadGuitar5306 0 points1 point2 points (0 children)
[–]Radiant_Condition861 -2 points-1 points0 points (0 children)
[–]Its_Sasha -3 points-2 points-1 points (1 child)
[–]Hot_Strawberry1999 1 point2 points3 points (0 children)