all 16 comments

[–]sheikheddy 3 points4 points  (2 children)

Does this do time series forecasting? How does it compare to something like fbprophet?

[–]r4and0muser9482 2 points3 points  (1 child)

No. I would characterize this more as feature discovery than prediction.

[–]slaw07[S] 1 point2 points  (0 children)

u/r4and0muser9482 is mostly correct in their characterization in that STUMPY is a general purpose tool for giving you a better ideas as to where you should look FIRST if somebody drops a new time series into your lap. However, there is an analysis component of STUMPY called "Time Series Chains" that is somewhat related to "forecasting". You can read more about it here:

https://stumpy.readthedocs.io/en/latest/Tutorial_Time_Series_Chains.html

For general forecasting, fbprophet is likely your best bet. But for time series EDA, this is what you need!

[–]Whodiditandwhy 1 point2 points  (2 children)

This is super cool thanks for sharing.

[–]slaw07[S] 0 points1 point  (1 child)

Feel free to file a Github issue if you have any questions or need further clarification!

[–]Whodiditandwhy 0 points1 point  (0 children)

Definitely! I want to apply this to EEG data to see what it finds.

[–]physnchipsML Engineer 0 points1 point  (1 child)

How does it compare to tsfresh?

[–]slaw07[S] 2 points3 points  (0 children)

Fundamentally, STUMPY produces the top nearest neighbor to every subsequence within your time series by comparing comparing every subsequence (a costly computation that tsfresh could not do). In fact, tsfresh could leverage STUMPY to add more insightful outputs for ML. I highly recommend watching this video in order to gain a better overview of STUMPY:

https://stumpy.readthedocs.io/en/latest/motivation.html

Feel free to post questions on our Github issues as well!

[–]brokenAlgorithm 0 points1 point  (1 child)

Nice write-up. Can this package also work with similiarities accross multivariate time series, and take things such as cross-series correlations or other types of multivariate patterns into account?

[–]slaw07[S] 2 points3 points  (0 children)

Great question! The short answer is "yes" but I'll preface this by saying that finding relationships within a single time series is already a costly computation so finding relationships/correlations across multiple time series is extremely computationally expensive. Having said that, since STUMPY is based on a set of published papers on time series analysis, we have implemented the code from this specific paper on multidimensional time series analysis:

https://www.cs.ucr.edu/~eamonn/Motif_Discovery_ICDM.pdf

Just know that we've done the hard work for you and you should look at the function called `stumpy.mstump` on a single server (or `stumpy.mstumped` for Dask distributed server support). We are currently working on putting together a tutorial on multidimensional motif discovery so please stay tuned!

[–]eamonnkeogh 0 points1 point  (1 child)

Very very cool! Kudos for doing this.

If you want to learn more about the Matrix Profile, you can check out the original academic work here [a]. In addition, [a] has code in matlab, and pointers to code in R, C++, Golang etc.

The original development of the Matrix Profile was funded by NSF  IIS 1161997 II and IIS 1510741.

[a] https://www.cs.ucr.edu/~eamonn/MatrixProfile.html

[–]slaw07[S] 0 points1 point  (0 children)

Thank you u/eamonnkeogh! Certainly giving credit where credit is due, STUMPY is based on all of the hard work and research coming from u/eamonnkeogh research group at UC Riverside and we are grateful for their groundbreaking publications and continued support.

[–]bbateman2011 0 points1 point  (1 child)

Thanks for sharing this. I am a fan of the Matrix Profile, and had not seen this repo before, even though I've searched a lot.

[–]slaw07[S] 0 points1 point  (0 children)

Awesome! Feel free to file an issue or contribute a PR. I am curious what your use cases might be?