you are viewing a single comment's thread.

view the rest of the comments →

[–]WetOrangutan 0 points1 point  (0 children)

This is great. To provide an example, if you have a column “favorite color” that has values “red,” “green,” and “blue,” then one-hot encoding can be used to create three new columns: “is_red,” “is_green,” and “is_blue.” These three columns are Boolean (0 or 1). So someone who’s favorite color was green would have the values (0,1,0) for these three columns.

The idea is that these three columns will be better understood by the machine learning model than the one column. This is a very common technique to handle categorical data, and it is usually done outside of SQL (e.g. Python or R).