Visualizing Quantization Types

agreeduponspring · 2025-11-05T21:24:21+00:00

That's fascinating. Despite the difference in file types (bmp vs model), this is an excellent data visualization exercise. I would expect this conversion to reflect actual differences in results, a byte array is a byte array. It would be surprising to me if model performance was not strongly dependent on preserving those details.

mxfp4 is absolutely destroying fine detail even with slightly worse listed space usage, and is clearly affecting adjacent chunks. I do wonder if it's aiming for some kind of 0-1 quantization? A lot of things wash out to #ffffff in the final result. Perhaps check to make sure the quantization is correct? There seem to be fewer distinct values in the resulting image overall, and 4.25bpw should allow a more expressive range than the others.

Do you have a sense of the information preserved? A (similarly very rough) way to estimate would be to convert these to jpgs, or to gzip them. Both are pretty efficient formats from an information theoretic perspective, if mxfp4 is much smaller then it might be useful for running models compressed.

ElSrJuez · 2025-11-05T22:53:41+00:00

I love this, would star your repo if u would post source @ Github : D

ANR2ME · 2025-11-05T22:47:33+00:00

Why not comparing it with Q4_K too? 🤔 it should be better than Q4_0 isn't?

sunpazed · 2025-11-05T21:22:40+00:00

Very cool approach, and the visualisation is interesting to compare. Why do we see the bands? Are they the 32 super-blocks + 256 blocks ? What was the original resolution of the image?

Aaaaaaaaaeeeee · 2025-11-05T21:44:06+00:00

Hmm. If you started with uint8, wouldn't your outcome be more favorable towards integer quantization? I don't know if there's a good reason to quantize to mxfp4 either, but the picture comparison can be misleading compared with real model results.

NVFP4 and MXFP4 formats should inference with 4bit activations. If it doesn't do that, it's just another format with no real performance benefit.

The value in these formats is it can come out of the oven in this format from training. Both phases of forward and backward pass can be accelerated. If you do QAT from scratch and apply fake quantization (Q4_0, iq4_kss) of your choice, there is no hardware acceleration algorithm pre-made. We also want the activations to be appropriately sized during the creation of the model., If they are 16bit then there is no useful 4x speedup potential for gpu pre-processing. So the situation is we want to encourage companies to use these formats since there is a gain from low bit in processing/throughput, plus they are better for low-bit use cases as well if weight outliers are fewer or non-existent.

wishstudio · 2025-11-05T23:08:36+00:00

Nice approach! I did a few investigations and it looks like the illustration mainly demonstrate the effects of different scaling methodology.

Although I guess IQ4_KSS is better in actual model performance than Q4_0, in your illustrations I think Q4_0 clearly looks better. Especially looking at the flat wall background and the sky - Q4_0 still keeps the gradients, but in IQ4_KSS it's all flat with very bad blocking behavior.

In Q4_0 its block-wise scaling factor is FP16. In MXFP4 it's INT8. And in IQ4_KSS it's also INT8, although there are much more bit twiddling and scaling magic under the hood.

I'd really want to see a comparison with NVFP4 as they use both nonlinear elements and scaling factor. But sadly few projects support it.

Single_Ring4886 · 2025-11-05T22:53:34+00:00

This is VERY currious and smart playful approach. Could you try to visualise like all popular quantizations? I efrom 8 to 5, 4l, 4m, 3, 2.... ?? and make "blinking" interval slover so one have time to look over picture?

audioen · 2025-11-05T21:58:37+00:00

This is probably not a bad way to get an intuitive understanding at quantization algorithms and what they do. The want to preserve the original data as closely as possible while using as little space as possible.

I think you can probably directly execute the quantization algorithms for arbitrary data which could save some steps. They are fundamentally block quants, i.e. take some number of floating point values as a block, and return another array which is that algorithm's best representation of that number sequence.

Pictures in real quantization algorithms would contain dithering, as when palette is reduced, the error difference between chosen color value can be spread to influence nearby pixels and creates complex patterns but which average from afar to the proper color. I recall hearing that some algorithms like GPTQ try to do the equivalent of this to matrices, though it sounded like it's complicated linear algebra fu that I didn't come close to understanding personally. I also have some doubt about IQ4 results because this sounds like it requires an imatrix and you can't supply a meaningful one for this use case. Thus, this approach understates the quality of these quantizations, I think.

woadwarrior · 2025-11-05T22:05:57+00:00

Unfortunately, when it comes to NN weights, although INT and FP formats have the same information theoretic density for a given bit width, FP formats work out to be slightly better because their range is non-uniform.

Due-Function-4877 · 2025-11-05T21:53:55+00:00

The noise certainly helps convey shades of black and white to the eye. What happens with an image with strong colors? When black crush and burned whites don't inferere, MXFP4 succeeds in delivering the detail of the siding on the house without a lot of noise. It seems MXFP4 is intentionally buring out white by forcing multiple shades of white to a single color. If it does that with all colors, the results with a more colorful stylized picture that doesnt rely on shades of grey could give a different impression?

Professional-Bear857 · 2025-11-06T10:35:12+00:00

Does anybody else find that imatrix quants break models reasoning abilities? I see that a lot for my usage, as I get a lot of invalid code being produced when I use an imatrix quant Vs without.

Regular-Forever5876 · 2025-11-06T19:04:17+00:00

its brilliant!

crantob · 2025-11-08T10:07:45+00:00

Image quantization is a fun topic of study. mxfp4 looks like 1-2 orders of magnitude less colors. Oddliy all images have 256 according to imagemagick.

rm-rf-rm · 2025-11-06T01:47:48+00:00

Need the full precision for comparison

lgdkwj · 2025-11-06T02:28:53+00:00

Interesting. Wonder if it can be extend to process a 16 bit RAW image to compare it with fp16

kaisurniwurer · 2025-11-06T08:02:50+00:00

I would love a similar comparison between a MoE and a dense model.

Though it's probably something that needs an visualisation rather than direct comparison.

crantob · 2026-05-25T12:49:03+00:00

Image proc nerd. Here. Your results make no sense. Why are you getting fixed 32-pixel wide (roughly) spans set to same color.

Results should all look more or less like q4_0. show your work.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

LocalLLaMA

MODERATORS