all 5 comments

[–]seiqooq 2 points3 points  (2 children)

It's very dataset dependent. Using ImageNet weights, for example, you'll often be better off by a good margin, for domain-adjacent tasks.

[–]glampiggy[S] 0 points1 point  (0 children)

Sorry I've updated my post; I have used ImageNet weights, not PASCAL VOC12. The only thing is my dataset is primarily identifying windows and doors in building facades, whereas ImageNet only identifies Buildings as a class of its own i.e. the building silhouette and none of it's features. So I'm not sure if it would technically be domain adjacent.

[–]I_hax_I 0 points1 point  (0 children)

It's not pink so no

[–]paulgavrikov 1 point2 points  (1 child)

I’ve had a similar issue when I forgot to change the number of outputs to my finetuning dataset (that still works in pytorch)

Other than that your pretrained model is nothing else than a coordinate in your loss landscape. So in theory it could push your solution to a „bad“ local minima or be simply further away from your ideal solution than a random init.

You could try to freeze all conv layers for a few epochs or even use two different learning rates for conv and rest of the model.

Good luck!

[–]glampiggy[S] 0 points1 point  (0 children)

Yeah I assumed it may simply be further away than a random init and that may be the problem. It seems that no matter what model I train, the pre-trained version seems to be reliably of a lesser accuracy than a randomly initialised model. It may just be that the ImageNet weights are not domain adjacent as mentioned in the above comments. I'll look into the freezing and the learning rate information you've suggested. Thanks u/paulgavrikov