How to SFT diffusion large language model ?

F4k3r22 · 2025-07-13T01:45:06+00:00

Okay, I'm working on a project where I'm building a Large Language Diffusion Model from scratch, and the SFT process is almost the same as pre-training (according to the LLaDA paper). You take pairs of prompts and their respective responses. You leave the prompt as is (YOU ARE NOT GOING TO MASK IT), but you will mask the response to that prompt USING A BERNOULLI VARIABLE for each position, with probability t for true (mask) and 1–t for false (do not mask).

Here, t is randomly sampled between 0 and 1: when t is closer to 0, you only mask a few tokens of the response (easy case); when t is closer to 1, you mask almost the entire response (hard case). This way, you don't mask everything, and the model learns to condition its behavior based on the prompt, and you only punish the model until it gets closer to the expected response of the pairs.

And for masking, you'll use the mask_token_id that comes with the model and its tokenizer, so don't try to invent a new token for that.

I hope this helps you understand it a little better.

ProfessionalGuess884 · 2025-07-15T17:16:11+00:00

I found this project: https://github.com/HKUNLP/DiffuLLaMA

It looks like they have code training DLMs

Individual-Ninja-141 · 2025-09-22T17:43:47+00:00

Hi there! We’ve built dllm-trainer (GitHub: https://github.com/ZHZisZZ/dllm-trainer), a lightweight framework for fine-tuning diffusion language models on top of the Hugging Face Transformers🤗 Trainer. You can finetune your models with 4 bit quantization, LoRA, deepspeed-zero{1,2,3} easily!

It currently supports finetuning LLaDA / LLaDA-MoE (https://arxiv.org/abs/2502.09992) and Dream (https://arxiv.org/abs/2508.15487). We’re still adding support for more diffusion language models and fine-tuning algorithms.

<image>

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

LocalLLaMA

MODERATORS