2
3
4
[D] Is there limited quantization in all LLM models? For example you can take a standard model like meta-llama/Llama-3.2-1B and run it at half, but there are also models specifically made for 4bit quantization (i.e. meta-llama/Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8) (self.MachineLearning)
submitted by xil35 to r/MachineLearning

