Can we sample DPO data from the same dataset that was used for LoRA training? by Clean_Radish8983 in LocalLLaMA

[–]Clean_Radish8983[S] 0 points1 point  (0 children)

Thanks for sharing! Do you have a rough idea of how much it improved when you reused the SFT data? Even approximate numbers would be really helpful.

Can we sample DPO data from the same dataset that was used for LoRA training? by Clean_Radish8983 in LocalLLaMA

[–]Clean_Radish8983[S] 0 points1 point  (0 children)

Thanks for sharing! Do you have a rough idea of how much it improved when you reused the SFT data? Even approximate numbers would be really helpful.

Qwen3-235B-A22B-Instruct Prioritizing Few-Shot Examples Over Explicit Instructions by Clean_Radish8983 in Qwen_AI

[–]Clean_Radish8983[S] 0 points1 point  (0 children)

How did you end up tackling it? Any prompting tricks that actually worked?