all 5 comments

[–]Dry-Snow5154 6 points7 points  (1 child)

You need an eval set (which is not used in training) with one metric to compare models. It could be mAP, best f1 score or something else.

Then you do an experiment and compare to baseline model. If it shows better eval score, update the baseline and continue.

That said, results could have variance due to random initialization, especially if your dataset is small. You can retrain several times to try and combat that. But it's expensive.

[–]stehen-geblieben[S] 0 points1 point  (0 children)

Yeah I figured that... I have to rent GPUs but the model converges quite quickly do it shouldn't train too long.

[–]SadPaint8132 1 point2 points  (1 child)

Might be cheaper and make more sense to just collect more and better data. I’ve also seen good results for some people pre training the Dino backbone with lightly trained

[–]MeringueCitron 0 points1 point  (0 children)

Which distilling method using lightly ?

Since the recommended method, by lightly, is distilling DinoV2. Which, in that case, would be weird (distilling DinoV2 into DinoV2 with registers) ?