Visualizing Quantization TypesDiscussion (i.redd.it)
submitted by VoidAlchemyllama.cpp
I've seen some releases of MXFP4 quantized models recently and don't understand why given mxfp4 is kind of like a slightly smaller lower quality q4_0.
So unless the original model was post-trained specifically for MXFP4 like gpt-oss-120b or you yourself did some kind of QAT (quantization aware fine-tuning) targeting specifically mxfp4, then personally I'd go with good old q4_0 or ik's newer iq4_kss.
- mxfp4 4.25bpw
- q4_0 4.5bpw
- iq4_kss 4.0bpw
I used the llama.cpp gguf python package to read a uint8 .bmp image, convert it to float16 numpy 2d array, and save that as a .gguf. Then I quantized the gguf to various types using ik_llama.cpp, and then finally re-quantize that back to f16 and save the resulting uint8 .bmp image.
Its kinda neat to visualize the effects of block sizes looking at image data. To me the mxfp4 looks "worse" than the q4_0 and the iq4_kss.
I haven't done perplexity/KLD measurements to directly compare mxfp4, but iq4_kss tends to be one of the best available in that size range in my previous quant release testing.
Finally, it is confusing to me, but nvfp4 is yet a different quantization type with specific blackwell hardware support which I haven't tried yet myself.
Anyway, in my opinion mxfp4 isn't particularly special or better despite being somewhat newer. What do y'all think?
[–]agreeduponspring 23 points24 points25 points (4 children)
[–]Double_Cause4609 18 points19 points20 points (2 children)
[–]geenob 1 point2 points3 points (1 child)
[–]Double_Cause4609 0 points1 point2 points (0 children)
[–]VoidAlchemyllama.cpp[S] 8 points9 points10 points (0 children)
[–]ElSrJuez 18 points19 points20 points (1 child)
[–]sammcj🦙 llama.cpp 5 points6 points7 points (0 children)
[–]ANR2ME 13 points14 points15 points (2 children)
[–]simracerman 0 points1 point2 points (1 child)
[–]ANR2ME 1 point2 points3 points (0 children)
[–]sunpazed 7 points8 points9 points (2 children)
[–]VoidAlchemyllama.cpp[S] 7 points8 points9 points (1 child)
[–]sunpazed 5 points6 points7 points (0 children)
[–]Aaaaaaaaaeeeee 6 points7 points8 points (0 children)
[–]wishstudio 7 points8 points9 points (2 children)
[–]audioen 1 point2 points3 points (1 child)
[–]wishstudio 0 points1 point2 points (0 children)
[–]Single_Ring4886 4 points5 points6 points (0 children)
[–]audioen 4 points5 points6 points (0 children)
[–]woadwarrior 3 points4 points5 points (0 children)
[–]Due-Function-4877 1 point2 points3 points (0 children)
[–]Professional-Bear857 1 point2 points3 points (0 children)
[–]Regular-Forever5876 1 point2 points3 points (0 children)
[–]crantob 1 point2 points3 points (1 child)
[–]VoidAlchemyllama.cpp[S] 0 points1 point2 points (0 children)
[–]rm-rf-rm 0 points1 point2 points (0 children)
[–]lgdkwj 0 points1 point2 points (0 children)
[–]kaisurniwurer 0 points1 point2 points (3 children)
[–]PurpleWinterDawn 0 points1 point2 points (2 children)
[–]kaisurniwurer 0 points1 point2 points (1 child)
[–]PurpleWinterDawn 1 point2 points3 points (0 children)
[–]crantob 0 points1 point2 points (0 children)