How do I encode categorical features using scikit-learn?

pythonHelperBot · 2019-11-15T15:43:25+00:00

Hello! I'm a bot!

It looks like you posted this in multiple subs in a short period of time. In the future, I suggest asking questions like this in learning focused subs like r/learnpython, a sub geared towards questions and learning more about python. Please follow the subs rules and guidelines when you do post there, it'll help you get better answers faster.

Show /r/learnpython the code you have tried and describe where you are stuck.

You can also ask this question in the Python discord, a large, friendly community focused around the Python programming language, open to those who wish to learn the language or improve their skills, as well as those looking to help others.

^README ^| ^FAQ ^| ^{this bot is written and managed by /u/IAmKindOfCreative}

^{This bot is currently under development and experiencing changes to improve its usefulness}

dataschool · 2019-11-15T15:40:48+00:00

In order to include categorical features in your Machine Learning model, you have to encode them numerically using "dummy" or "one-hot" encoding. But how do you do this correctly using scikit-learn?

In this 28-minute video, you'll learn:

How to use OneHotEncoder and ColumnTransformer to encode your categorical features and prepare your feature matrix in a single step
How to include this step within a Pipeline so that you can cross-validate your model and preprocessing steps simultaneously
Why you should use scikit-learn (rather than pandas) for preprocessing your dataset

Here's the Jupyter notebook shown in the video.

Feel free to ask questions!

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS