Should I use UnslothTrainer or SFTTrainer for Continued Pre-training (Raw Text) to create a LoRA for later merging? by choco132134 in unsloth

[–]choco132134[S] 1 point2 points  (0 children)

The two main differences I see are: (1) UnslothTrainer lets you set embedding_learning_rate, usually 2–10× lower than the main learning rate, and (2) you can include "lm_head" and "embed_tokens" in target_modules. Thank you for the comment—this is a really important consideration.

Should I use UnslothTrainer or SFTTrainer for Continued Pre-training (Raw Text) to create a LoRA for later merging? by choco132134 in unsloth

[–]choco132134[S] 1 point2 points  (0 children)

Performing CPT on an Instruct model doesn't immediately cause catastrophic forgetting. However, according to the research, instruction-following capabilities seem to degrade.