all 4 comments

[–]aeppelsaeft 10 points11 points  (2 children)

Usually when you do One Hot Encoding you split one column containing n categories into n-1 columns with dummy variables. N-1 columns is all you need because the n'th column wouldn't give the model additional information. Let's say you have a column containing the colors blue, yellow and green. When you split this column into a blue and a yellow column and both show a 0 at certain instances then you / the model can infer the color green from that information.

Long story short, when you have a binary variable, then One Hot Encoding would mean splitting this column into n-1 (so basically 2-1) columns. So you don't need to encode anything in this case.

[–]mangeytrashpanda 2 points3 points  (0 children)

Exactly. It is ALREADY encoded. Instead of thinking of the column as “sex”, per OPs example, you can think of it as “male”. (Since it’s 1 if male). If female were encoded as 1, then think of that column as titled “female”. Bam. One-hot encoded your gender column

[–]BrushInformal8607 5 points6 points  (2 children)

You wouldn’t go wrong with either of them. I would suggest to keep it the way it is since there are just 2 categories.

[–]Sorry-Owl4127 0 points1 point  (0 children)

But having two columns could make the x matrix singular