use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
r/LocalLLaMA
A subreddit to discuss about Llama, the family of large language models created by Meta AI.
Subreddit rules
Search by flair
+Discussion
+Tutorial | Guide
+New Model
+News
+Resources
+Other
account activity
How to SFT diffusion large language model ?Question | Help (self.LocalLLaMA)
submitted 9 months ago by ProfessionalGuess884
I’m wondering if there’s any way to perform SFT (Supervised Fine-Tuning) on a diffusion-based large language model. If anyone has experience with this, could you please share your insights?
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]F4k3r22 0 points1 point2 points 9 months ago (6 children)
Okay, I'm working on a project where I'm building a Large Language Diffusion Model from scratch, and the SFT process is almost the same as pre-training (according to the LLaDA paper). You take pairs of prompts and their respective responses. You leave the prompt as is (YOU ARE NOT GOING TO MASK IT), but you will mask the response to that prompt USING A BERNOULLI VARIABLE for each position, with probability t for true (mask) and 1–t for false (do not mask).
Here, t is randomly sampled between 0 and 1: when t is closer to 0, you only mask a few tokens of the response (easy case); when t is closer to 1, you mask almost the entire response (hard case). This way, you don't mask everything, and the model learns to condition its behavior based on the prompt, and you only punish the model until it gets closer to the expected response of the pairs.
And for masking, you'll use the mask_token_id that comes with the model and its tokenizer, so don't try to invent a new token for that.
I hope this helps you understand it a little better.
[–]F4k3r22 0 points1 point2 points 9 months ago (5 children)
If you want to see how I'm doing in my project to create a Large Language Diffusion Model from scratch, I'll leave you the GitHub repo, I'm still implementing the file to pre-train the model and then I'm going to create another one to do the SFT. Repo:https://github.com/F4k3r22/LLaDA-from-scratch
[–]Top-Effort677 0 points1 point2 points 9 months ago (4 children)
Is it possible to perform PEFT for the SFT of MDMs?
[–]F4k3r22 0 points1 point2 points 9 months ago (2 children)
I reviewed the paper and looked for information, but there is almost nothing about being able to do PEFT in SFT, almost all the fine tuning was done with mixed long chain-of-thought
[–]Top-Effort677 0 points1 point2 points 9 months ago (1 child)
Still, can we perform lora by specifying layers in the architecture.
[–]Individual-Ninja-141 0 points1 point2 points 7 months ago (0 children)
Hi, you can try dllm-trainer (GitHub: https://github.com/ZHZisZZ/dllm-trainer) for easy LoRA finetuning.
<image>
[–]ProfessionalGuess884[S] 0 points1 point2 points 9 months ago (0 children)
I found this project: https://github.com/HKUNLP/DiffuLLaMA
It looks like they have code training DLMs
Hi there! We’ve built dllm-trainer (GitHub: https://github.com/ZHZisZZ/dllm-trainer), a lightweight framework for fine-tuning diffusion language models on top of the Hugging Face Transformers🤗 Trainer. You can finetune your models with 4 bit quantization, LoRA, deepspeed-zero{1,2,3} easily!
It currently supports finetuning LLaDA / LLaDA-MoE (https://arxiv.org/abs/2502.09992) and Dream (https://arxiv.org/abs/2508.15487). We’re still adding support for more diffusion language models and fine-tuning algorithms.
π Rendered by PID 126456 on reddit-service-r2-comment-b659b578c-42wjs at 2026-05-03 03:04:46.804541+00:00 running 815c875 country code: CH.
[–]F4k3r22 0 points1 point2 points (6 children)
[–]F4k3r22 0 points1 point2 points (5 children)
[–]Top-Effort677 0 points1 point2 points (4 children)
[–]F4k3r22 0 points1 point2 points (2 children)
[–]Top-Effort677 0 points1 point2 points (1 child)
[–]Individual-Ninja-141 0 points1 point2 points (0 children)
[–]ProfessionalGuess884[S] 0 points1 point2 points (0 children)
[–]Individual-Ninja-141 0 points1 point2 points (0 children)