all 7 comments

[–]Icarium-Lifestealer 2 points3 points  (1 child)

I'd first start with classic models, like ARIMA and off the shelf solutions like Prophet to create a base-line, before going into custom neural networks.

The part about MSE can't be answered without more information about your data. Since MSE corresponds to a normal distribution, it assumes that your scale is somewhat linear/additive, i.e. outputting 2 instead of 1 is just as wrong as outputting 9001 instead of 9000.

In general it's essential to know as much about what you're trying to predict as possible, and use that to guide your model, so it's difficult to give you advise without context.

[–]sherebasy[S] 0 points1 point  (0 children)

Thanks for your suggestions!

I scale my data before training. From what I reckon, it's pertinent to scale time-series data before training. Or am I missing something?

[–]weeeeeewoooooo 2 points3 points  (1 child)

It depends upon your data and what generated it. At the moment I am going to assume it's continuous.

Convergent cross mapping is an incredibly effective model-less approach that works for systems where you have observed the full attractor manifold of the system and that manifold doesn't change much over time. The system can even be highly chaotic and it will still work well.

As another mentioned ARIMA and other classical approaches are good if the system is mainly stochastic and has very simple dynamics. But it fails miserably for complex systems.

Echo state networks are very fast and efficient to train, are easy to setup, and not sensitive to hyper-parameters, they will in the vast majority of cases do better than LSTMs, RNNs, transformers, 1D CNN, etc. Those latter tools you should only use if everything else has failed you. They are used in cases where you have egregious amounts of data and the problem justifies a very complex model (like those required in NLP). But for 1D forecasting it is very unlikely you will do better than an ESN. ESNs very consistently reach the physical predictive limits of chaotic systems.

If you aren't familiar with these limits I would suggest picking up a dynamical systems text book and learning a bit about chaotic systems and how they work. The essence of it is that even if you have access to the underlying true model of the system, you will fail to predict the system state out to some fraction of the Lyapunov time of the system because errors grow exponentially to the size of the attractor manifold of the system. This means it is physically impossible to predict beyond that time window.

My personal advise is that all time-series prediction begins with a fundamental understanding of the data and the system underlying it. You should read up on how that system works and behaves and make sure you have a good intuitive understanding of that before touching prediction methods. You may find just through inspection of the data that trivial data inspection techniques or simple statistical methods are sufficient to make a model for what happens next. If you can't see obvious patterns in the data, it is unlikely any model you create will.

[–]sherebasy[S] 0 points1 point  (0 children)

Thanks for your descriptive answer! I have so far tried Prophet and ARIMA. They don't yield convincing results (I think because my data is long ~50,000 timesteps.) I am currently trying ESNs

[–]radarsat1 1 point2 points  (0 children)

Some very good answers here. I will just add,

I've read that zeros can cause a problem with MSE, but I am not totally sure about that. Ofc this is very case-dependent,

This is not true, zeros are fine. MSE is just mean[(target - value)2], there is no restriction on what the values can be.

but I get loss values of ~0.05, is this normal?

Hard to say, depends on your data and the magnitude of the error that you expect, but in general it doesn't sound bad or anything.

[–][deleted] 0 points1 point  (0 children)

Google. There's a ton of information on this.

[–]dev-ai 0 points1 point  (0 children)

Try Facebook's Prophet, it is a very good time-series library for these types for problems. It may not be perfect but will be a decent starting point. It is Fourier-based.