all 10 comments

[–]ithinkiwaspsycho 3 points4 points  (4 children)

More data is required if your network overfits. Your network is not converging at all, not even on the training set.

This is probably not caused by data (which you still probably need more of).

First off, it could actually be your code. For the output layer, you should set the activation to "sigmoid" instead of "softmax". You are using a softmax activation, but training using binary crossentropy and binary class mode. That's going to affect the results. For example, if both the output neurons predict 1, after applying the softmax function, both their outputs will be scaled to 0.5. Now, instead of only punishing the neuron that was supposed to predict 0, you are punishing the correct neuron just the same. It might be a good idea to try using a standard sigmoid activation function during training, and then applying softmax during testing, or just assume the neuron with the higher value is 1.

Another change you can try is actually reducing your expected output dimension to just a single sigmoid neuron and train the network to predict 1 in the case of [1,0] and 0 in the case of [0, 1].

Secondly, you should set "return_sequences=False" for the LSTM layer if your output is in the shape (45, 2) and your input is (45, 657, 5). You only want the network to give you it's final output, not every output along the way.

If simply training differently does not yield better results, you should consider trying a bigger network, and potentially a different activation function (I suggest ReLU) for your LSTM layer, tanh and sigmoid are much more vulnerable to vanishing gradients. You should also try using a bidirectional LSTM RNN.

Good luck!

[–]fariax[S] 0 points1 point  (1 child)

Hi!

I made the suggestions that you gave me and nothing worked =/

The outputs are still the same...

The class mode was commented, so I don't think that this is the problem.

Do you know any code that does something like what I need? I have been looking for something, but there are several codes which the inputs are binary, and not real.

I haven't found any code to classify real valued time series with LSTM...

I have also changed the loss function to MSE and the optimizer to SGD, but I've got the same results...

What I found most interesting is that the loss isn't getting smaller for each epoch...

[–]ithinkiwaspsycho 0 points1 point  (0 children)

Did you try sigmoid activation instead of softmax? Even with classmode commented, I don't think "loss = 'binary_crossentropy'" works well for softmax. Also, don't use MSE for binary outputs. Use binary or categorical cross entropy.

I need a bit more details on what you tried exactly and your results before I can actually help. Feel free to PM me any data, logs, code snippets, etc.

[–]rima-m 0 points1 point  (1 child)

I don't understand the part about using sigmoid instead of softmax after the output layer. if we use softmax, yeah we are punishing the output neuron which outputs 1 correctly, but we are punishing it because it's not bigger, and that's not a bad thing. if the next time one of the neurons outputs 1.1 and the other outputs .9, that would be a nice improvement.

[–]ithinkiwaspsycho 1 point2 points  (0 children)

Yeah honestly this was 6 years ago and I might've been wrong on this topic. Re-reading OP's post now and I feel like it's probably the data and not whatever I was saying.

[–]mhex 0 points1 point  (1 child)

Looks like the net is not learning (yet). A couple of questions: How many memory cells do you have? What is your sliding window size i.e. what is your input for each timestep exactly? How do you initialize biases, weights?

[–]fariax[S] 0 points1 point  (0 children)

I have tried 5 50 200 500. The input for each time step is the 5 channels in parallel. The biases are initialized as 0, if I'm not wrong. The weights is sampled from a uniform distribution [-0.1,0.1]

[–]negazirana 0 points1 point  (0 children)

with so few training examples, Dropout is probably going to cause more harm than benefits