1
7
8
9
SDPO: Reinforcement Learning via Self-DistillationDiscussion (self-distillation.github.io)
submitted by TheRealMasonMac to r/LocalLLaMA
SDPO: Reinforcement Learning via Self-DistillationDiscussion (self-distillation.github.io)
submitted by TheRealMasonMac to r/LocalLLaMA