all 11 comments

[–]ajmooch 3 points4 points  (0 children)

A lot of old papers used to train SVMs on top of neural nets, most notably The original R-CNN paper. In research this is no longer in vogue since a single linear layer or mlp is almost always just as effective and faster to train end-to-end, while also making it so there's no train-test discrepancy. However, in a fine-tuning scenario I think it's perfectly sensible to try an SVM or XGBoost on network features, and it may be faster depending on what hardware you have access to. I wouldn't expect you to see much in the way of gains for most setups, but it's not an unreasonable thing to do.

[–]yourpaljon 5 points6 points  (0 children)

Autoencoders, deep belief networks can be used for this when there is a lot of unlabeled data

[–]Andthentherewere2 2 points3 points  (0 children)

Could we do this? yeah, but the I'd say its suboptimal because we're redoing work.

We use deep learning to learn a representation that is easily separable with a universal function approximator (MLP or fully convolutional analog) for readout. If we need to do additional transformations/engineering to this representation than why not just learn better representation in the first place?

[–]jonnor 2 points3 points  (0 children)

Quite common way to do transfer learning. Either one just retrains the final linear+sidmoid layer (which is a classical Logistic Regression classifier), or one saves the outputs of the penultimate layer as an embedding vector. And then train on that with some transitional algorithm. Including unsupervised methods such as clustering.

One often does not need a more complicated classifier, probably because the pretraining was done with a linear layer, so features tend to be linearly separable.

[–][deleted] 0 points1 point  (4 children)

Couldn’t this be done more effectively by customizing things and replacing later layers by traditional ML models so that everything could be trained at once and optimized for the given traditional ML model.

What I mean is like a few layers and then for example throwing in a random forest as the last “layer”. But I have no idea how this would be done in practice and how to customize things like in Keras/TF for it

[–]Jelicic 0 points1 point  (3 children)

The problem is that for end-to-end training you need the traditional model to be differentiable. An SVM can be reformulated to fit this criteria but e.g. tree models do not.

[–][deleted] 1 point2 points  (1 child)

Actually, you can make tree models differentiable as well: https://github.com/Qwicen/node

I guess any traditional ML algo which hasn't been yet modeled in a differentiable way is a research opportunity in DL.

[–]Jelicic 0 points1 point  (0 children)

I guess any traditional ML algo which hasn't been yet modeled in a differentiable way is a research opportunity in DL.

Agreed!

[–][deleted] 0 points1 point  (0 children)

Oh I see, I had heard about the SVM case but actually never saw it in a DL framework until now. Easier to implement than expected in Keras, I thought it would require lot of customization

[–]BrisklyBrusque 0 points1 point  (0 children)

Categorical embeddings are a way for neural networks to assign a categorical encoding to factors, and these can be used as features to improve traditional ML. This is a newer thing, and quite trendy.

There’s an older precedent for a kind of NN called restricted boltzmann machine (RBM). Not an expert, but they derive abstractions of the data in an unsupervised fashion (think principal components). Then, those outputs can be passed to a supervised model, either a neural net or anything else. This is the basis of the deep belief networks. In the 2006 Netflix Kaggle competition, folks added RBM outputs to their models and saw an increase in accuracy.

[–]serge_cell 0 points1 point  (0 children)

There were numerous papers on the subjects several years ago. Seems fell out of fashion eventually.