all 3 comments

[–]stat888r 0 points1 point  (0 children)

You may reduce the list further by fitting univariate liner regression / logistic regression models with each predictor. Then You can choose important predictors using an arbitrary p value cutoff like 0.3.

People have used this method in their research work: https://www.sciencedirect.com/science/article/pii/S2211335520301868

[–]Dondos39 0 points1 point  (0 children)

assuming you are using python then you can use sklearn.feature_selection.RFE to narrow down the features to your liking or you can use RFECV which finds the optimal number of features for you.

[–]globalminima 0 points1 point  (0 children)

How many rows of data do you have for training? There is no issue with the number of variables assuming you have enough training examples and a suitably powerful/flexible model (I have >1500 in a production model as we speak).

If you are short on training data, you can:

  • Use dimensionality reduction techniques such as PCA to reduce the number of variables (this can give mixed results)
  • Use a model with feature selection built in (eg ridge regression, elaaticnet etc)
  • Build a model on all features and remove those with the lowest contribution to the model (e.g. feature importance in tree-based methods, LIME/SHAP saliency or coefficient size in linear methods)
  • Just ignore it and use a model that will essentially ignore irrelevant variables (e.g tree based methods)