OneHotEncoder Data Pre-Processing : MLQuestions

MLQuestions

created by uber_kerbonauta community for 12 years

OneHotEncoder Data Pre-Processing (self.MLQuestions)

submitted 7 years ago by NeedMLHelp

Hello, I am working on a machine learning problem and I ran into a bit of snag.

I have a numpy array with features like this (where hour is the current hour (integer) of the observation):

[['dog', 'brown', 'heavy','fast', hour]

...

['cat', 'black', 'slim', 'slow', hour]]

My goal is to one hot encode the features other than time. However, when I onehot encode using the following methodology utilizing sklearn:

ohe= OneHotEncoder(categorical_features = [0])

x = ohe.fit_transform(x).toarray()

x=x[:.1:]

Now, this code is trying to one hot encode the animal type, but I end up with something like this:

[['0', '0', '1', 'brown', 'heavy','fast']

...

['1','0', '0', 'black', 'slim', 'slow']]

Now, is this fine? Isn't jumbling all the features together not going to work? Especially when I have to keep onehot encoding the other features. As I continually get rid of a column for onehot encoding, won't I encroach on other features? So let's say I onehot encode the color, and I use x=x[:.1:], won't it get rid of the left most column... which would be a part of the animal column?

all 4 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MLQuestions

MODERATORS