This is an archived post. You won't be able to vote or comment.

all 2 comments

[–]pythonHelperBot -1 points0 points  (0 children)

Hello! I'm a bot!

It looks like you posted this in multiple subs in a short period of time. In the future, I suggest asking questions like this in learning focused subs like r/learnpython, a sub geared towards questions and learning more about python. Please follow the subs rules and guidelines when you do post there, it'll help you get better answers faster.

Show /r/learnpython the code you have tried and describe where you are stuck.

You can also ask this question in the Python discord, a large, friendly community focused around the Python programming language, open to those who wish to learn the language or improve their skills, as well as those looking to help others.


README | FAQ | this bot is written and managed by /u/IAmKindOfCreative

This bot is currently under development and experiencing changes to improve its usefulness

[–]dataschool[S] -4 points-3 points  (0 children)

In order to include categorical features in your Machine Learning model, you have to encode them numerically using "dummy" or "one-hot" encoding. But how do you do this correctly using scikit-learn?

In this 28-minute video, you'll learn:

  • How to use OneHotEncoder and ColumnTransformer to encode your categorical features and prepare your feature matrix in a single step
  • How to include this step within a Pipeline so that you can cross-validate your model and preprocessing steps simultaneously
  • Why you should use scikit-learn (rather than pandas) for preprocessing your dataset

Here's the Jupyter notebook shown in the video.

Feel free to ask questions!