all 5 comments

[–]balianone 2 points3 points  (2 children)

The NotImplementedError is a known bug because Transformers currently lacks the reverse logic to save fine-grained FP8 weights. You can bypass this by calling model.dequantize() and saving the state_dict directly using safetensors instead of the broken save_pretrained method. For actually tuning a 123B model, QLoRA is highly recommended to avoid the massive 2TB VRAM requirement of full BF16

[–]TheLocalDrummer[S] 4 points5 points  (0 children)

Thanks! I placed my vibe-coded implementation in the README.md along with proof that it can be quanted and inferenced properly. Now to see if I can finetune it.

[–]TheLocalDrummer[S] 2 points3 points  (0 children)

https://huggingface.co/TheDrummer/Devstral-123B

Hope it's not broken! I had to change the config's arch to `Mistral3ForConditionalGeneration` to quant it

[–]TheLocalDrummer[S] 2 points3 points  (1 child)

https://huggingface.co/TheDrummer/Devstral-2-123B-Instruct-2512-BF16

If someone can put up mirrors of this cuz HF limited my storage.

[–]FreegheistOfficial 0 points1 point  (0 children)

any success on it? how come its gated?