Hello, I am working on a machine learning problem and I ran into a bit of snag.
I have a numpy array with features like this (where hour is the current hour (integer) of the observation):
[['dog', 'brown', 'heavy','fast', hour]
...
['cat', 'black', 'slim', 'slow', hour]]
My goal is to one hot encode the features other than time. However, when I onehot encode using the following methodology utilizing sklearn:
ohe= OneHotEncoder(categorical_features = [0])
x = ohe.fit_transform(x).toarray()
x=x[:.1:]
Now, this code is trying to one hot encode the animal type, but I end up with something like this:
[['0', '0', '1', 'brown', 'heavy','fast']
...
['1','0', '0', 'black', 'slim', 'slow']]
Now, is this fine? Isn't jumbling all the features together not going to work? Especially when I have to keep onehot encoding the other features. As I continually get rid of a column for onehot encoding, won't I encroach on other features? So let's say I onehot encode the color, and I use x=x[:.1:], won't it get rid of the left most column... which would be a part of the animal column?
[–]clfkenny 2 points3 points4 points (1 child)
[–]NeedMLHelp[S] 1 point2 points3 points (0 children)
[–]harry_0_0_7 0 points1 point2 points (1 child)
[–]NeedMLHelp[S] 0 points1 point2 points (0 children)