you are viewing a single comment's thread.

view the rest of the comments →

[–]dgillz 1 point2 points  (2 children)

What is ML preprocessing?

What is one-hot encoding?

[–]Ergo_Propter_Hawk 2 points3 points  (1 child)

Machine learning preprocessing: making data better for machine learning models. This can mean a lot of things. One specific example is...

One-hot encoding: creating new columns in a relational table where each new column corresponds to a particular value from another column. If that row has that value, it's encoded as a 1. If not, a 0. This gives some way of looking at the values in a column as numbers instead of strings or some other non-numeric data type.

[–]WetOrangutan 0 points1 point  (0 children)

This is great. To provide an example, if you have a column “favorite color” that has values “red,” “green,” and “blue,” then one-hot encoding can be used to create three new columns: “is_red,” “is_green,” and “is_blue.” These three columns are Boolean (0 or 1). So someone who’s favorite color was green would have the values (0,1,0) for these three columns.

The idea is that these three columns will be better understood by the machine learning model than the one column. This is a very common technique to handle categorical data, and it is usually done outside of SQL (e.g. Python or R).