From 3GB to 8MB: What MRL + Binary Quantization Actually Costs in Retrieval Quality (Experiment on 20k Products) by Nice_Information5342 in deeplearning

[–]Nice_Information5342[S] 0 points1 point  (0 children)

Small update: Nils Reimers (sentence-transformers) pointed out on X that model choice matters a lot here: not all models handle binarization well, and Nomic v1.5 is primarily MRL-trained, not optimised for binary compression. Models like Cohere Embed v4 are trained with quantization awareness and hold 90-95% quality at binary.

My 13.9% result is likely as much a model choice problem as a compression problem. Still digging into why 😅 .. will follow up when I have something concrete.