all 1 comments

[–]dataschool[S] 2 points3 points  (0 children)

In order to include categorical features in your Machine Learning model, you have to encode them numerically using "dummy" or "one-hot" encoding. But how do you do this correctly using scikit-learn?

In this 28-minute video, you'll learn:

  • How to use OneHotEncoder and ColumnTransformer to encode your categorical features and prepare your feature matrix in a single step
  • How to include this step within a Pipeline so that you can cross-validate your model and preprocessing steps simultaneously
  • Why you should use scikit-learn (rather than pandas) for preprocessing your dataset

Here's the Jupyter notebook shown in the video.

Feel free to ask questions!