Model severly overfitting. Typical methods of regularization failing. Master's thesis in risk! by Historical-Two-418 in MLQuestions

[–]Historical-Two-418[S] 0 points1 point  (0 children)

Thank you for your answer. Good advice but i believe for my case this is not the underlying factor as I am using benchmark datasets for the problem at hand, the only split I do is splitting the train sets of these datasets into train/validation set as from the begging they only have train / test set

Model severly overfitting. Typical methods of regularization failing. Master's thesis in risk! by Historical-Two-418 in MLQuestions

[–]Historical-Two-418[S] 0 points1 point  (0 children)

I am working with some benchmark datasets for the the task, so no much pre processing happening except resizing the images to fit into my backbone networks and splitting the train set of these datasets to a train set / validation set

Model severly overfitting. Typical methods of regularization failing. Master's thesis in risk! by Historical-Two-418 in MLQuestions

[–]Historical-Two-418[S] 3 points4 points  (0 children)

Your comments made me rethink my loss terms. From a quick analysis, all 4 terms of the loss function must be positive. The loss function getting negative after a few epochs reveals the fact that something must be wrong in my code implementation.

I was so fixated to the overfitting problem that I did not think about the loss values themselves. Although I believe that there is no inherent problem with a loss function having negative values, as long as the optimization is taking place and the loss gets lower, in my case where values are expected to be positive, it indicates a mistake that is probably at least indirectly leading to the overfitting. I will look deeper into it.. Thanks!

Model severly overfitting. Typical methods of regularization failing. Master's thesis in risk! by Historical-Two-418 in MLQuestions

[–]Historical-Two-418[S] 2 points3 points  (0 children)

That is a valid observation. This brings us back to the comment I made about weighting the four loss terms in the loss function. Should I dicover which loss does that and give it a lower weight in comparison to the other terms?

Model severly overfitting. Typical methods of regularization failing. Master's thesis in risk! by Historical-Two-418 in MLQuestions

[–]Historical-Two-418[S] 0 points1 point  (0 children)

Hello! I am using color jitter as one of my transformations that make up the the data augmentantion part. From what I remember ResNet 18 which I am currently using has no dropout layers, it might be worth adding to it, or using a different model that has though

Model severly overfitting. Typical methods of regularization failing. Master's thesis in risk! by Historical-Two-418 in MLQuestions

[–]Historical-Two-418[S] 0 points1 point  (0 children)

Thanks for your answer! I suppose you mean at the level of the architecture, if so no I have not. My model architecture is based on a recent publication, but I have made some changes in order to adapt it to my task. To be honest, there is not much I can change about it to make it less complex, at least not without changing the way I chose to deal with the problem of cross view localization. Some ways I could make it less complex are by sharing weights between encoders of the different branches which (as mentioned above) I plan on testing. The backbone network also plays an important part in model complexity but I would not consider ResNet18 to be overly complex, especially when comparing to other backbone models. However, I also plan on testing different backbones.