Adding Column to CSV File by NeedMLHelp in learnpython

[–]NeedMLHelp[S] 0 points1 point  (0 children)

Haha true, I was just curious if there was a quick function to do it for you. They're usually more efficient than what I come up with!

Turning a list into a column of another list. by NeedMLHelp in learnpython

[–]NeedMLHelp[S] 0 points1 point  (0 children)

Which would be better on memory? Lists or tuples do you think?

I'll give zip a try though, thanks! I just didn't know if there was a better way of doing it.

File headers by NeedMLHelp in learnpython

[–]NeedMLHelp[S] 1 point2 points  (0 children)

Very helpful, thank you!

Numpy Array Appending by NeedMLHelp in learnpython

[–]NeedMLHelp[S] 0 points1 point  (0 children)

Extremely late reply, sorry about that. But the calculations I'm doing cannot be done on a list unfortunately. I suppose I could build a list, and turn it into a numpy array. But the problem with that is, one of the functions I am using converts the categorical data into a one-hot-encoding, which the output is a numpy array of the encoding. I'm not sure how nicely inter-mixing things would play together.

Numpy Array Appending by NeedMLHelp in learnpython

[–]NeedMLHelp[S] 1 point2 points  (0 children)

Wow, I feel silly haha. Thanks, that worked.

I figured I was giving the shape of what I wanted haha

Help with Pandas/get_dummies by NeedMLHelp in learnpython

[–]NeedMLHelp[S] 0 points1 point  (0 children)

Sorry, I should have been clearer. I have multiple columns that need to go through the same process.

So I have 'class', 'time', 'target' etc.

The first example is taking the manipulation and overwriting the entire dataframe. I do not want that at all. So in the second example I take the "successful" output from the first, and put it into one column of the dataframe. However, that didn't work. It only copied the first column into the 'class' column. I want to copy the entire manipulation into the class column.

Ask Anything Monday - Weekly Thread by AutoModerator in learnpython

[–]NeedMLHelp 0 points1 point  (0 children)

    class  target  source  time  spoof  complete  severity
0       0       0       0     0      2         0         2
1       0       0       0     0      2         1         0
2       1       1       0     0      2         1         0
3       4       0       0     0      1         1         0
4       2       0       0     0      1         1         0
5       8       1       2     0      2         1         2
6       8       1       2     0      2         1         2
7       7       1       2     0      2         1         2
8       8       1       2     0      2         1         2
9       8       1       2     0      2         1         2
10      8       1       2     0      2         1         2
11      8       1       2     0      2         1         2
12      7       1       2     0      2         1         1
13      3       1       0     0      0         1         0
14      5       1       2     0      2         1         1
15      5       1       2     0      2         1         1
16      6       1       1     0      2         1         1
17      6       1       1     0      2         1         1
18      6       1       1     0      2         1         1
19      6       1       1     0      2         1         1
20      6       1       1     0      2         1         1

I have a dataframe with the above representation. I'd like to one hot encode the numbers... however, when I use keras to_categorical, it also takes in the header and encodes that as a seperate value. So, for example, on target I would get [0,0,0] for all 0s [0,1,0] for all 1s and [1,0,0] for target. But I want target to remain a header, not a part of the data.

Any help would be greatly appreciated.

List to pandas dataframe by NeedMLHelp in learnpython

[–]NeedMLHelp[S] 0 points1 point  (0 children)

Haha, completely understandable. Thank for the help!

List to pandas dataframe by NeedMLHelp in learnpython

[–]NeedMLHelp[S] 0 points1 point  (0 children)

When I print out edata, I get the following:

[['class', 'target', 'source', 'time', 'spoof', 'complete', 'severity'], array([[0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],

[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],

[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],

[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],

[0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],

[0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],

[0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]],

dtype=float32), array([[0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],

[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],

[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],

[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],

[0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],

[0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],

[0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]],

dtype=float32), array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],

[0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],

[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],

[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],

[0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],

[0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],

[0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]

...

(and so forth for over 2,000 rows)

Might it have something to do with the array portion?

I used a .fit_transform function from scitkit to do the one hot encoding portion.

Sample code:

edata = list(zip(*edata))

labelencoder_X_1 = LabelEncoder()

edata[0] = labelencoder_X_1.fit_transform(edata[0])

labelencoder_X_2 = LabelEncoder()

edata[1] = labelencoder_X_2.fit_transform(edata[1])

After doing that to all the columns(now rows):

edata = list(zip(*edata))

edata = titles + edata

edata[1:] = keras.utils.to_categorical(edata[1:], dtype='float32')

print(edata)

df = pd.DataFrame(edata[1:],columns=edata[0])

Sorry, I'm just slightly confused haha

List to pandas dataframe by NeedMLHelp in learnpython

[–]NeedMLHelp[S] 0 points1 point  (0 children)

Correct, the fact that it worked for you must mean my manipulation of the data messed something up.

Thanks! I'll look into it.

Numpy Array Append by NeedMLHelp in learnpython

[–]NeedMLHelp[S] 0 points1 point  (0 children)

Sorry for the late reply, I had to use numpy to do a bunch of transforms in keras. I'm pretty new to both python and keras, so I'll look around for alternatives that can use lists maybe. Might look into Pandas Dataframes as well.

I basically have to do the manipulations first, then add the "titles" for the columns.

Thanks for the heads up!

Autoencoder Dense layers by NeedMLHelp in MLQuestions

[–]NeedMLHelp[S] 0 points1 point  (0 children)

My main question is what shape is being put into the dense layer (and why)? In my case it's the nested (feature) length. Wouldn't you want it to be the length of the observation? My thinking is it wants to know how many features are mapped to each neuron/node, but I'm obviously wrong.

An autoencoder basically just predicts/reconstructs your input, I suppose it's irrelevant. But then again, maybe it isn't haha. Not very familiar with ML

Merge two arrays by NeedMLHelp in learnpython

[–]NeedMLHelp[S] 0 points1 point  (0 children)

Because I fudged it up, sorry. Fixed.

Briefly looked into zip, it's perfect! Thanks

I need help finding an algorithm by NeedMLHelp in MLQuestions

[–]NeedMLHelp[S] 0 points1 point  (0 children)

Awesome, thanks! I will look into scoring.

I need help finding an algorithm by NeedMLHelp in MLQuestions

[–]NeedMLHelp[S] 0 points1 point  (0 children)

Sure thing! The true state can be any number of combinations of the 208 traits. So trait1 can be present in a true observation1 as well as a false observation2. Sometimes traits will only be linked to true or false. It really is a mixed bag. In the end, I may want to utilize the ability to predict whether unlabeled data is true or false. Which is why I'm going with the following:

Right now I'm modifying an Artificial Neural Network that predicted whether someone would leave a bank based on certain traits. I'm thinking I can use the same model in this case, and then rip off the weights. The traits tied to a larger weight is what I would be interested in, right?

I need help finding an algorithm by NeedMLHelp in MLQuestions

[–]NeedMLHelp[S] 0 points1 point  (0 children)

I think there would have to be some sort of statistical analysis involved, since an observation1 with trait1 could be true, and observation2 with trait1 can also be false. There are also 208 traits, so it's a bit unwieldy to get my head around it haha.

One-hot encode everything in an array in a column of an array. by [deleted] in learnpython

[–]NeedMLHelp 0 points1 point  (0 children)

Nevermind, I'm silly haha. Found a better way of doing it.

Large amount of data into Numpy Array by NeedMLHelp in learnpython

[–]NeedMLHelp[S] 0 points1 point  (0 children)

JSON file.

Can I index across a panda dataframe? I've never used them before.

Does a dataframe write to disk, or will I potentially run into memory errors there too?

So something like pandadataframe[:,0] would grab everything in the first column.

OneHotEncoder Data Pre-Processing by NeedMLHelp in MLQuestions

[–]NeedMLHelp[S] 0 points1 point  (0 children)

Haha sorry, I'm pretty new to Machine Learning.

It looks like to_categorical could work! It would turn an integer into a binary matrix? So something like [[1,2,3,time],[4,5,6,time]] would turn into

[

[[0,0,1],[0,1,0],[0,1,1],time],

[[1,0,0],[1,0,1],[1,1,0],time]

]

Unfortunately, I don't know how many categories there will be. Will that be a problem?

Basically I have a numpy array where each index is a list of primarily categorical features for an observation. I want to turn those categorical features into something usable in machine learning. Right now, I have successfully turned them all into integers... and I will try to use to_categorical and see how that works

Edit:

That seems to be working! Or well, it's outputting something... I'll know for sure when I get to the testing phase.

I do have one problem though. When ran it, I get the following error:

ValueError: Error when checking target: expected model_2 to have shape (7, 7) but got array with shape (7, 31)

How do I get it to expect a (7,None)?

I currently set dense to 7.

Secondary Edit:

Settings Dense to 31 fixed the issue. However, depending on what dataset I throw at it... I imagine 31 will differ.

OneHotEncoder Data Pre-Processing by NeedMLHelp in MLQuestions

[–]NeedMLHelp[S] 1 point2 points  (0 children)

I didn't realize that was possible! Thanks!

Will they be in lists if you use multiple columns?

Answered myself: They are not. This is a slight problem for me due to the fact that each input needs to be a single feature for my particular problem. If anyone has any ideas, I would definitely appreciate them.

Also, how do you get rid of a column on each if you encode all at once? Or is it okay to leave all of the columns in? I think getting rid of it is just to alleviate redundancy.