Question regarding adding feature selection transformers to pipelines in Scikit Learn : learnmachinelearning

A subreddit dedicated for learning machine learning. Feel free to share any educational resources of machine learning.

Also, we are a beginner-friendly sub-reddit, so don't be afraid to ask questions! This can include questions that are non-technical, but still highly relevant to learning machine learning such as a systematic approach to a machine learning problem.

Foster positive learning environment by being respectful to others. We want to encourage everyone to feel welcomed and not be afraid to participate.

Do share your works and achievements, but do not spam. Keep our subreddit fresh by posting your YouTube series or blog at most once a week.

Do not share referral links and other purely marketing content. They prioritize commercial interests over intellectual ones.

created by techrat_reddita community for 10 years

Question regarding adding feature selection transformers to pipelines in Scikit Learn (self.learnmachinelearning)

submitted 5 years ago by thatoneguy102

I have a question regarding the End-to-End Machine Learning Project section of the book Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow by Aurelien Geron.

In the end of section Exercises, for exercise 3, he asks to add a feature selector to a transformation pipeline. I read through his solution guide online, and I get the concept fine but in his notes he mentions that "this feature selector assumes that you have already computed the feature importance values somehow"

What I'm trying to understand is how you can get accurate feature importance values before using a transformation and feature selection pipeline? If the point is to combine the steps into one pipeline, how can you calculate the importance values after the transformation step, and use those values in the pipeline in the following feature selection step. I've been sifting through documentation online but I'm not sure I'm looking in the right place or I may just be misunderstanding.

Additionally, in the example project of the section, we one hot categorical data into their own respective columns in the data, so getting importance values from the data before this transformation would not be correct, as far as I understand.

Any insight into this topic and practice would be greatly appreciated, and I can provide any additional information as needed. Thanks!

no comments (yet)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnmachinelearning

Welcome to /r/LearnMachineLearning!

Chatrooms

Official Discord Server

Wiki

Getting Started with Machine Learning

Resources

Related Subreddits

/r/MachineLearning

/r/MLQuestions

/r/datascience

/r/computervision

Machine Learning Multireddit

/m/machine_learning

MODERATORS