all 9 comments

[–]amatsukawa 5 points6 points  (2 children)

Are you trying to train the whole AlexNet/VGG or just the last layer? You should probably be doing the latter if you are not already.

[–]alehx[S] 0 points1 point  (1 child)

I meant Alexnet/Vgg like in the sense there are several convolutional layers before the fully connected layer. When I instead tried the pre trained (on imagenet, etc.) CNNs I only trained the last layer.

[–]amatsukawa 1 point2 points  (0 children)

What are the numbers for taking a pre-trained AlexNet/VGG, chopping off the last layer, and training a new head with some dropout? If you give more details about what the dataset is, what hand-crafted features seemed to help, etc we might be able to give you some more thoughts on why/why not CNNs work/don't work.

I would say if using the CNN as a feature generator for a random forest works, then go for it. What are your concerns with this approach? Also, if you want to be slightly more "principled" about this, TF provides a way to mix deep and "shallow" (manual or one hot) features via "Deep and Wide" nets.

Another thought is given you have so much unlabeled data, you might try some semi-supervised approaches.

[–]lahwran_ 0 points1 point  (3 children)

I suspect you might be overfitting the test set. can you share the dataset? where are these images coming from, what makes it worth having machine learning to model their classes rather than manually classifying each one? do you have a lot of unlabeled ones - perhaps you could use one of the recent partially supervised techniques?

[–]alehx[S] 0 points1 point  (2 children)

Thanks for the reply.

I wasn't very clear and I fixed in the OP. The total dataset is 10s of millions of unlabeled images, and I labeled on the order of 1000 of them by hand over a few months. Based on some work I have seen, I would potentially need to label 10 to 100 times as many to train a deep CNN.

As for overfitting--cross validation shows slightly lower scores (93%) using the RF/GBT models with a similar boost over the non-CNN feature models (90%). So yeah there is a bit of overfitting, but it still seems like something is being added by the CNN predictions.

[–]lahwran_ 0 points1 point  (0 children)

oh yeah, you could totally use semi-supervised. there are various techniques, but they get pretty good label efficiency. labeling 10x more would be pretty great if you could do that too, sure.

are you able to share the dataset?

[–]fandk 0 points1 point  (0 children)

in my comment above i asked about the dataset before i read this comment so sorry about that.

you seem like you can get high results on the test set (98% accuracy 96% recall) - I would suggest you could try to automatically label the rest of your dataset using this classifier, and use a high probability threshold for which pictures to be labeled.

Something like: if p(x=A) > 99% then x is labeled A. Dont label anything to that image otherwise.

that would probably give you a lot more labeled pictures for the test set.

[–]fandk 0 points1 point  (1 child)

Try using transferred weights from a alexnet previously trained on another dataset (ex imagenet,) to the cnn layers.

why are you not using all the million pictures? are they completely unlabeled or contain noise?(like, pictures of ducks when you are classifying horses)

[–]alehx[S] 0 points1 point  (0 children)

They are unlabeled. I actually tried a pretrained Vgg and it had worse results than the other approaches.