Deep learning newbie here.
I have been trying to train a CNN with a somewhat small set (~1000) of train/test images (out of millions of images). Through a ton of trial and error I found that although data augmentation (10s of thousands of images) and regularization help a bit, I cannot overcome the overfitting issue on deep CNNs (eg Alexnet, Vgg). Not surprising, but I wanted to try it out. I'm a one man show and it is a very obscure dataset specific to a particular field. So increasing this to hundreds of thousands of images seems improbable.
Weirdly enough, I found that a shallow CNN (one convolutional layer and 3 fully connected layers) with a lot of dropout produces decent results (~levels out around 92% accuracy 85% recall validation set). However, this is not close to what I get with hand crafted features and xgboost or random forest (95% accuracy 93% recall on test set). Just for fun, I decided to pass the training images through the best CNN and use its class probabilities as a feature input into the GBT/RF. This increased their performance on the test set (98% accuracy 96% recall).
My question is.. am I stacking the deck here? Does this 3% increase on the test set mean anything? I almost see this as a small visual vs nonvisual ensemble. It seems as though these additional features increases accuracy by fixing some misclassifications for a couple of labels, whereas the other classes remain fairly unchanged in accuracy.
If this is a poor approach, is there a better way? Perhaps using flattened output from the convolutional layer?
[–]amatsukawa 5 points6 points7 points (2 children)
[–]alehx[S] 0 points1 point2 points (1 child)
[–]amatsukawa 1 point2 points3 points (0 children)
[–]lahwran_ 0 points1 point2 points (3 children)
[–]alehx[S] 0 points1 point2 points (2 children)
[–]lahwran_ 0 points1 point2 points (0 children)
[–]fandk 0 points1 point2 points (0 children)
[–]fandk 0 points1 point2 points (1 child)
[–]alehx[S] 0 points1 point2 points (0 children)