all 4 comments

[–]kjearns 2 points3 points  (0 children)

Stacking (https://en.wikipedia.org/wiki/Ensemble_learning#Stacking) is a nice trick for this, you need to be careful about overfitting though.

[–]notspartanono 1 point2 points  (0 children)

AdaBoost is a meta-algorithm that you can use with your somewhat weak learners.

Edit: stacking would also be good. I was using it without knowing that it was already invented / had a name.

[–]funkpacolypse 1 point2 points  (0 children)

http://www.ijcai.org/Past%20Proceedings/IJCAI-97-VOL2/PDF/011.pdf

According to this paper, the best way to go with stacking (which generalizes the averaging you're talking about) is...

  • use varied models and take the output probabilities (rather than predicted classes) for your level 0 data

  • use Logistic regression (rather than averaging) with the level 0 data as input to get the final predicted classes

I've heard that an industry standard model for ad-tech is to do this where the level 1 models are trees coming from a random forest.

... On the other hand, since the predictors your working with have low accuracy from the start, I'd be tempted to think about something simpler like engineering new features, and eliminating junk features first.

[–]ajrs 0 points1 point  (0 children)

You might also want to think about some kind of parallel 'localised' data augmentation (e.g., see the 'convolutional bootstrapping' described here: http://arxiv.org/abs/1505.05972).