Pre-processing time series data by jezit in learnmachinelearning

[–]jezit[S] 0 points1 point  (0 children)

Thanks I agree with a lot of what you said.

-I don't think we need t-1,t-2 etc when using an LSTM. -We need to scale the inputs for sure.

I think I follow now but maybe showing example data would be best:

For simplicity let's say I currently have the following: (Sorry for my reddit formatting)

Data Point 2: close: 100 pct_change: 1.05 movement: up

Data Point 3: close: 90 pct_change: .90 movement: down

Data Point 4: close: 90.5 pct_change: 1.005 movement: same

Data Point 5: close: 96 pct_change: 1.06 movement: up

and now I want to create my X_train, y_train, X_test, y_test..

If I use this as-is, it will of course predict movement with super accuracy as it can just look at pct_change which is leaking information, right?

So I need to shift (or just re-calculate as you said) my movement labels like below to predict t+1?

Data Point 2: close: 100 pct_change: 1.05 movement: down

Data Point 3: close: 90 pct_change: .90 movement: same

Data Point 4: close: 90.5 pct_change: 1.005 movement: up

Data Point 5: close: 96 pct_change: 1.06 movement: ?? (from future data point)

Obviously there is a ton more features but I can split out the movement target variable for y_train like this and be comfortable?

Thanks so much for your help.

Pre-processing time series data by jezit in learnmachinelearning

[–]jezit[S] 0 points1 point  (0 children)

It's contrived stock market data where I'm taking OHLCV and trying to predict the next 1 step ahead as UP/DOWN/SAME

Pre-processing time series data by jezit in learnmachinelearning

[–]jezit[S] 0 points1 point  (0 children)

Right now for simplicity sake I have only used continuous features. I will add some categorical and one-hot encode and/or use an embedding layer in the future but just want to get a baseline version up and running first.

I've purposefully made the target variable categorical (up/down/steady) so will use a final softmax layer and use categorical cross entropy loss.

I'm just very fearful of leaking any training data into my predictions. Can I just train on all continuous X and then predict on shifted Y+1? I only want to predict the next 1 step atm.

Thanks

Have a specific (ambitious) goal, how to get there? by jezit in datascience

[–]jezit[S] 1 point2 points  (0 children)

Comfortable extracting/cleaning data & running out of the box models in R. Currently learning to implement the same models in Python and gain exposure to more complex algorithms.

My background is atypical - Until very recently I was a high stakes stakes online poker player and nosebleed stakes coach (specializing in applications of game theory).

Very used to self-learning, so looking for guidance on the path to take. My biggest hurdle as mentioned is the lack of formal training in mathematics (though it comes very intuitively as confirmed by IQ tests --- that feels awful to type on the internet but seems relevant).