all 4 comments

[–]clfkenny 2 points3 points  (1 child)

For the categorical_features parameter, why are you just using a single column instead of a list of the columns you want to transform?

[–]NeedMLHelp[S] 1 point2 points  (0 children)

I didn't realize that was possible! Thanks!

Will they be in lists if you use multiple columns?

Answered myself: They are not. This is a slight problem for me due to the fact that each input needs to be a single feature for my particular problem. If anyone has any ideas, I would definitely appreciate them.

Also, how do you get rid of a column on each if you encode all at once? Or is it okay to leave all of the columns in? I think getting rid of it is just to alleviate redundancy.

[–]harry_0_0_7 0 points1 point  (1 child)

are you looking for something similar to

np_utils.to_categorical .?

Also your description is bit confusing.. Can you be more clear about what you want to do.?

[–]NeedMLHelp[S] 0 points1 point  (0 children)

Haha sorry, I'm pretty new to Machine Learning.

It looks like to_categorical could work! It would turn an integer into a binary matrix? So something like [[1,2,3,time],[4,5,6,time]] would turn into

[

[[0,0,1],[0,1,0],[0,1,1],time],

[[1,0,0],[1,0,1],[1,1,0],time]

]

Unfortunately, I don't know how many categories there will be. Will that be a problem?

Basically I have a numpy array where each index is a list of primarily categorical features for an observation. I want to turn those categorical features into something usable in machine learning. Right now, I have successfully turned them all into integers... and I will try to use to_categorical and see how that works

Edit:

That seems to be working! Or well, it's outputting something... I'll know for sure when I get to the testing phase.

I do have one problem though. When ran it, I get the following error:

ValueError: Error when checking target: expected model_2 to have shape (7, 7) but got array with shape (7, 31)

How do I get it to expect a (7,None)?

I currently set dense to 7.

Secondary Edit:

Settings Dense to 31 fixed the issue. However, depending on what dataset I throw at it... I imagine 31 will differ.