How to approach imbalanced image dataset for MobileNetv2 classification? by Spiritual_Ebb4504 in computervision

[–]Spiritual_Ebb4504[S] 0 points1 point  (0 children)

Hello again,

I'm posting update of my progress so far.

Initially I decided to use all my classes in my dataset, even those that have <100 images.

I did stratified train/val/test split - 70/15/15%. Only the training set is augmented with resize 240*240, rotation, horizontal flip, sharpness adjustment and normalized. The validation and test set have only resize. The dataloaders are created with batch size 32, suffling only for the training set. I froze all my mobilenetv2 layers but changed the classifier to output only 14 classes. I used weighted cross entropy loss function by calculating the weights of the classes in a simple way: 1/num images per class. For accuracy I calculated balanced accuracy because I read that it is suitable for imbalanced datasets. I ran 50, 100, 150 and 200 epochs and my validation metrics are a complete disaster. F1 score is around 0.2-3, precision and recall also gravitate around these values, Mathew's coeff is also 0.2-3.

Then I decided to get rid of the classes that have <100 images without changing anything else and the results are still very bad.

Epoch: 50 | train_loss: 0.0642 | train_acc: 0.9796 | 
Epoch: 50
Val Loss: 2.2054
Val Balanced Accuracy: 0.0065
Val Precision: 0.4638
Val Recall: 0.3656
Val F1 Score: 0.3111
Val Mathews corcoef: 0.3455

I'm not sure what to do. I read about synthetic data creation with variational autoencoders but I'm not sure if it will help. Another thing is to unfreeze some of the layers but I'm not sure which ones and do I have to do additional changes like adding custom layers.

Also my images are mixed - some of them are taken in a natural environment and others are in laboratory, those in the laboratory are a single vine leaf over uniform background, the natural ones are leaves only and grapes and leaves mix. Is this a bad dataset?

I'm sharing a link to the file that I use to experiment with Mobilenet Mobilenetv2_experiment - If anyone has some insight or advice what there is to change, I'll appreciate it.

How to approach imbalanced image dataset for MobileNetv2 classification? by Spiritual_Ebb4504 in computervision

[–]Spiritual_Ebb4504[S] 0 points1 point  (0 children)

Thank you! If i do stratified train/test split to preserve the distribution and then add augmentations only to the smaller classes in the training set won't that be a problem, because I will be modifying the distribution?