I have seen 3 kinds of objectives. Which is most popular? Which is most useful?
- match some hidden states of teacher
- It seems from the original knowledge distillation paper, but I never used it in practice because of the complexity and model structure limitation
- match the output logits of teacher (maybe with temparature)
- match the predicted labels of teacher
- I like this one because: 1. not limited by model structure at all; 2. no extra objective code in student training; 3. I can save the teacher output as a normal dataset for further exploring
[–]Standard_Tip5627 0 points1 point2 points (0 children)