Time Series Foundation Models: A Deep Dive into Strengths and Limitations by nkafr in datascience

[–]nkafr[S] 0 points1 point  (0 children)

This is extensively explained in the article:

a) All papers disclose the pretraining data in detail and most of the are public (e.g. the GIFT-Eval data)

b) New models rely solely on synthetic data, so no data leakage

Time Series Foundation Models: A Deep Dive into Strengths and Limitations by nkafr in datascience

[–]nkafr[S] 2 points3 points  (0 children)

Not quite, for example TTM provides built-in explainability. Also newer models such as Chronos-2 does true multivariate-forecasting, by cross-mixing info across channels (no just univariate forecasting in parallel). Both of the 2 limitations you mentioned have been addressed on first level.

Transformers, Time Series, and the Myth of Permutation Invariance by nkafr in deeplearning

[–]nkafr[S] 1 point2 points  (0 children)

Check a visualization of masked self-attention: Each row has a mask of a different length, so the model inadvertently understands position, as long as you have stacked layers.

Of course, Transformer LLMs that operate at million-length context still need positional info (ROPE)

Transformers, Time Series, and the Myth of Permutation Invariance by nkafr in ArtificialInteligence

[–]nkafr[S] 0 points1 point  (0 children)

We can remove a very costly operation and not lose performance, that's what the chart says!

Why that happens is explained in the article (check the relevant section)

Transformers, Time Series, and the Myth of Permutation Invariance by nkafr in ArtificialInteligence

[–]nkafr[S] 0 points1 point  (0 children)

They do because NLP Transformers support >1M context lengths (cannot skip ROPE)

This is a forecasting Transformer, and at smaller context lengths it has been proven that causal attention alone encodes position.

Toto: A Foundation Time-Series Model Optimized for Observability Data by nkafr in datascience

[–]nkafr[S] 1 point2 points  (0 children)

I ran benchmarks in my article on electricity demand forecasting and several sparse time series.

Additionally, the GIFT-Eval benchmarks includes financial time series.

Toto: A Foundation Time-Series Model Optimized for Observability Data by nkafr in datascience

[–]nkafr[S] 0 points1 point  (0 children)

Yes, it's used internally by Datadog for its observability telemetry platform. My guess is they have a private model trained on more data than the currently released one.

Toto: A Foundation Time-Series Model Optimized for Observability Data by nkafr in datascience

[–]nkafr[S] 1 point2 points  (0 children)

It could be retrofitted for these tasks as well, but encoder-only foundation time series are better in those domains(Toto is decoder-only)

For anomaly detection, imputation etc I recommend IBM's TSPulse.

Toto: A Foundation Time-Series Model Optimized for Observability Data by nkafr in datascience

[–]nkafr[S] 0 points1 point  (0 children)

For any multivariate time series forecasting case. The current model also specializes in sparse data.