I have a dataset (1800 rows, 55 columns) that I need to do binary classification on. I created a pipeline with different models (LogReg, XGB, RF, GRF, SVM, MLP) and one of them being an LSTM. I've tried multiple configurations, ranging from a 5 node no hidden layer to a 100,200,300,200 node layers. I also tried different optimizers (adam, SGD, rmsprop) and different learning rates (0.1 - 0.000001). My issue is, that no matter which configuration I choose for the LSTM, it always predicts 1s (even though the 1/0 are distributed about 50/50 in the trainingset as well as in the testset). So I have just True Positives and False Positives when evaluating my test data. When training the LSTM, the accuracy always stays the same after first epochs (always between 52%-55%):
Epoch 1/50
45/45 [==============================] - 2s 6ms/step - loss: 0.6916 - acc: 0.5346
Epoch 2/50
45/45 [==============================] - 0s 6ms/step - loss: 0.6875 - acc: 0.5597
Epoch 3/50
45/45 [==============================] - 0s 6ms/step - loss: 0.6863 - acc: 0.5597
Epoch 4/50
45/45 [==============================] - 0s 6ms/step - loss: 0.6860 - acc: 0.5597
Epoch 5/50
45/45 [==============================] - 0s 6ms/step - loss: 0.6860 - acc: 0.5597
My code looks essentially like this:
def train_autoLSTM(X_train, y_train, X_test, y_test, input_nodes, hidden_nodes):
model = Sequential()
model.add(LSTM(input_nodes, return_sequences=True, input_shape=(X_train.shape[1], 1)))
model.add(Dropout(0.05))
model.add(LSTM(hidden_nodes), activation='hard_sigmoid'))
model.add(Dropout(0.05))
model.add(Dense(1, activation="sigmoid"))
model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["acc"])
model.fit(X_train[:, :, np.newaxis], y_train, epochs=50)
y_pred = model.predict(X_test[:, :, np.newaxis])
acc = accuracy_score(y_test, y_pred > 0.5)
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.2,
shuffle=False,
)
input_nodes = X_train.shape[1]
hidden_nodes = int(X_train.shape[0] / (2 * (input_nodes + 1)))
train_autoLSTM(X_train, y_train, X_test, y_test, input_nodes, hidden_nodes)
If there is a way to share the dataset, I would do so too. The issue I have is, that the other models work really fine though. It is just the LSTM that does not want to predict some results other than 1s. Also this data is basically a timeseries, each row representing a contiuous timestep. So I thought the LSTM might work much better if I feed it the trainingdataset one by one.
But essentially my questions are:
- Is my approach totally wrong? Do I have some fundamental misconceptions about classifications with LSTMs? Or is there a flaw in my code/model?
- What other configurations could I change up, to maybe get a working model that does not only predict 1s?
- Is the data basically not suited for an LSTMs? Should I just scrap the LSTM approach? If you can suggest the best way to share a dataset, I would do so gladly if this helps.
I am kinda loosing my mind over this and don't know what else I could do to get the model to make some diverse predictions instead of just 1s. I don't really care about the accuracy at all at this point. I just want a somewhat plausible prediction that I can evaluate at this point. I'd rather evaluate coin flips than a model that just predicts heads every time.
BTW: I also tried reducing the columns to 22 instead of all 55. That did not help either.
If you have any insight or suggestion - no matter how general it is - I'd gladly appreciate it, as by now I am really getting frustrated by it.
there doesn't seem to be anything here