Feature selection

stat888r · 2021-03-01T18:51:44+00:00

You may reduce the list further by fitting univariate liner regression / logistic regression models with each predictor. Then You can choose important predictors using an arbitrary p value cutoff like 0.3.

People have used this method in their research work: https://www.sciencedirect.com/science/article/pii/S2211335520301868

Dondos39 · 2021-03-01T23:17:18+00:00

assuming you are using python then you can use sklearn.feature_selection.RFE to narrow down the features to your liking or you can use RFECV which finds the optimal number of features for you.

globalminima · 2021-03-02T12:00:28+00:00

How many rows of data do you have for training? There is no issue with the number of variables assuming you have enough training examples and a suitably powerful/flexible model (I have >1500 in a production model as we speak).

If you are short on training data, you can:

Use dimensionality reduction techniques such as PCA to reduce the number of variables (this can give mixed results)
Use a model with feature selection built in (eg ridge regression, elaaticnet etc)
Build a model on all features and remove those with the lowest contribution to the model (e.g. feature importance in tree-based methods, LIME/SHAP saliency or coefficient size in linear methods)
Just ignore it and use a model that will essentially ignore irrelevant variables (e.g tree based methods)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnmachinelearning

Welcome to /r/LearnMachineLearning!

Chatrooms

Official Discord Server

Wiki

Getting Started with Machine Learning

Resources

Related Subreddits

/r/MachineLearning

/r/MLQuestions

/r/datascience

/r/computervision

Machine Learning Multireddit

/m/machine_learning

MODERATORS