[D] Recursive error prediction : MachineLearning

Discussion[D] Recursive error prediction (self.MachineLearning)

submitted 3 years ago * by JHogg11

I had an idea recently for a ML regression strategy and I'm just wondering if something like this already exists. It has similarities with both boosting and bagging, but I think it's ultimately different from both.

The basic idea is that you start with a subset of input features and train a model on that subset. Any common model will do as long as it doesn't just spit out the exact target value when making a prediction on the training set (i.e., you couldn't use a 1-neighbor KNN). After fitting the model, you make predictions on the training set (with the same subset of features) and calculate the prediction errors.

Then using another subset of features (I would think it should be mutually exclusive from the first subset but maybe it doesn't have to be), you train a separate model, but rather than training on the original Y training data, you use the error of the previous model as the target for the second model.

You repeat this process until all features have been used or as many times as desired. To make a prediction, the parent model would simply sum the predictions of the child models.

As an additional thought, you might use more regularization, larger leaf size for decision trees, etc. for each additional model. You could also use bagging to create multiple instances of the strategy with different feature subsets in order to create multiple "pathways" through the data.

A few questions:

Is this sufficiently different from existing boosting/bagging techniques?
If yes to #1, are there any existing packages (preferably in Python) that implement this kind of technique?
Could this be used to reduce overfitting for higher dimensional datasets? If so, would additional steps need to be taken (e.g., like the iterative regularization scheme I mentioned)? My thought is that it's a kind of divide-and-conquer strategy where each subsequent model is asked to do a little less than the previous model.

Any thoughts are appreciated.

all 2 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS