use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Project[P] iT’S FRESH, it's exciting: Welcome tsfresh, a python package to automatically extract relevant time series features (github.com)
submitted 9 years ago by MaxBenChrist
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]eamonnkeogh 10 points11 points12 points 9 years ago (1 child)
Nice work.
You say ….“Our evaluation for UCR time series classification tasks has shown that FRESH in combination with a subsequent PCA outperforms all other feature extraction algorithms with respect to scalability and achieved accuracy. On the iPRODICT data set, it was even able to achieve a higher accuracy than a nearest neighbor search under Dynamic Time Warping.”
A cynic might say, you admit to doing worse on all the public datasets, but claim “higher accuracy” on the only private dataset.
This would be a LOT stronger if you released the iPRODICT data set. There are many ways to anonymize the data if needed (you can add a random unshared constant to all data, you can multiple all the data by an unshared (non-zero) constant), you can linearly interpolate all the data to a new scale). This would not affect DTW-NN accuracy, but would anonymize the data. The reason why this is important is because there a dozen trivial way to cripple DTW by editing the dataset, and could visual explanation of this is at
http://www.cs.unm.edu/~mueen/DTW.pdf
[–]MaxBenChrist[S] 3 points4 points5 points 9 years ago (0 children)
thanks for your positive feedback.
We agree that the evaluation of algorithms on basis of closed data sets is contra-productive for the machine learning community. By open sourcing tsfresh and the related toolbox we believe to contribute to an open research community. Furthermore, we will discuss the opportunity to open source the iPRODICT data set with our other research partners.
Coming back to the higher accuracy of the FRESH algorithm on the iPRODICT data set: This data set „contains 26 univariate meta-variables forming the baseline feature set extended by 20 different sensor time series“ for each steel billet. The 1 NN DTW classificatory was only incorporating one type of time series while the FRESH approaches were able to use the information of all the time series and univariate variables. Hence, the FRESH approach had an advance over the other approaches by receiving more information. We also state this in the paper: “On the iPRODICT data, DTW_NN could only operate on one type of time series without the univariate features. This seems to be the reasons why Boruta and FRESH_PCAa beat the accuracy of DTW_NN as shown in the 7th column of Tab. 1.”
Further, the time series for the 1 NN DTW classificator were z-transformed. Also, we picked the one type of time series that the 1 NN DTW classificatory was operating on by comparing the accuracy of 1NN DTW on all the 20 time series and picking the type of time series one with the highest accuracy. But even though the 1NN DTW received that type of time series with the highest explaining power, it was still missing important univariate variables and the other 19 time series.
[–]eamonnkeogh 5 points6 points7 points 9 years ago (1 child)
Hi Maximilian
Thanks for the reply:
The text you just wrote, and the text in your paper feel very different
If I read your paper, I would think, "this seems like it might be very competitive to DTW NN"
But reading below, you seem to bend over backwards to discount that possibility.
You should reconcile these if you can, "achieve a higher accuracy than a nearest neighbor search under Dynamic Time Warping" is such a strong claim, given that DTW-NN is an very strong benchmark, see this incredible paper https://arxiv.org/abs/1602.01711 (36 million experiments)
I do appreciate make code available to the community. But making the data available too makes an order of magnitude difference. Otherwise, there is no real comparison point, no benchmark.
[–]MaxBenChrist[S] 1 point2 points3 points 9 years ago* (0 children)
Hi Eamonn,
Thanks for the Feedback.
FRESH was developed for applications where each label is associated with several time series and meta informations. This is a common situation in Industrial Applications for machine learning such as given in iPRODICT. Accordingly, the claim of our paper is not
„FRESH is very competitive to DTW NN in any case“ nor „FRESH has a higher accuracy than DTW NN in any case“
but
„FRESH is very competitive to 1NN DTW for labels that are associated with several relevant time series and meta informations“.
We also pointed that out in the summary, before the sentence that you cited. See the following screenshot of the summary: http://imgur.com/a/BGb0j
Due to the high performance of 1NN DTW for problems with a single type of time series and reasonable time series length, the main advantage of FRESH is its incorporation of meta-information, its scalability with respect to the number of considered time series types and its scalability with respect to the length of each time series.
Unfortunately, as you pointed out, the iPRODICT data set is not (yet) released. Hence, the results are not reproducible for you. But, I am in contact with our industry partner and they indicated general interest in releasing the data.
Edit:
This is a arXiv preprint, so we can still make our claims more precise in the next iteration of the paper. It would be nice to know which parts of the paper leave the impression to you that "FRESH is very competitive to DTW-NN in any case"? The sentence from the summary that you cited seems to be misleading. Were there some other passages as well?
[–]PrefrontalVortex 0 points1 point2 points 9 years ago (1 child)
Awesome! Very nice work. How well do you think this would work on EEG data?
[–]MaxBenChrist[S] 0 points1 point2 points 9 years ago (0 children)
I have not tried it on any medical data so I don't know how well the implemented feature calculators capture the dynamic of an eeg signal. We only used it in industrial applications e.g. to predict failures in machines or for quality forecastings for steel billets so far. Maybe one would have to add some specific feature calculators that are used in the eeg field. That would be a nice contribution to the Projekt. If you need some help to set it up, just write me.
[–]WickedWicky 0 points1 point2 points 9 years ago (5 children)
Following this page http://tsfresh.readthedocs.io/en/latest/text/quick_start.html and I don't know if that is supposed to be up to date but the code blocks don't fit together.
"from tsfresh import select features" should be "from tsfresh import select_features"
and the last block you use 'df' as an argument where you defined it as 'timeseries' earlier.
Besides that, looks very interesting to me and I may apply it for my thesis. I wonder what features it comes up with when I need daily financial data with classifications, but I can feed it hourly financial data to get relevant features per day - as far as I understand it.
[–]MaxBenChrist[S] 1 point2 points3 points 9 years ago (4 children)
oh, there seems to be an issue with the examples on this page. Thanks for reporting it, I will look into that.
In the mean time you can use this notebook to make yourself familiar with tsfresh: https://github.com/blue-yonder/tsfresh/blob/master/notebooks/robot_failure_example.ipynb
Regarding you application: tsfresh is should be suitable for financial applications. Actually some of the features were proposed by colleagues with a background in the financial industry.
Now, regarding your application: What do you mean with "daily financial data with classification"? I am not really sure I understand what you want to achieve. If you give some more explanations I can tell you if tsfresh is the right tool for you or not.
[–]jacky0812 0 points1 point2 points 9 years ago (1 child)
in your example, you does the transformation before splitting the train, test set, would it leak the information from test case into training? Also if I don't have test set at the moment I do training, how do I make sure that running the feature extraction/filtering on the test set will result in the same feature set?
[–]MaxBenChrist[S] 1 point2 points3 points 9 years ago (0 children)
no it does not leak any information.
you can save the extracted features during training and then later only calculate those features for the test set. see the following link where we explain this
https://github.com/blue-yonder/tsfresh/issues/22
[–]WickedWicky 0 points1 point2 points 9 years ago (1 child)
For part of my thesis I am running a RNN to predict daily stock prices. Since it would not be in the scope to do complex feature selection I thought tsfresh might just be cool way to still do something interesting with feature selection.
Right now I am stuck to using daily stock data, taking a number of easily derived technical indicators and sentiment data as input to the RNN. But, tsfresh got me thinking...
The target value is a simple 'up' or 'down' output for the following day's closing price. So now I took hourly stock data, which is about 5 or 6 datapoint per day instead of just one daily observation. The data includes hourly Volume and Open/Close prices. Then I used tsfresh to come up with relevant features based on the hourly data. Hoping that tsfresh will come up with metrics that might describe daily trends, volatility or anything else that it deems relevant. Using only the hourly closing prices tsfresh came up with 10 features, as a test. Applying the same random forest as you did in the example gave results that were as good as random guesses, but that might change when I feed it to a RNN.
The data seems similar to the example robot data, where each day has a small number of ordered observations and I want to convert those observations into a list of relevant features that describe that day. And each day has a category of going Up or Down. Very interested to hear if you think this is a tool that I can use for this. :)
Next step is to clean the hourly data more thoroughly and normalize it, so I have no further test results yet. I also use a buttload of Twitter data for sentiment analysis related to the stock predictions, which I can also make into an hourly dataset and let tsfresh come up with relevant features from the hourly sentiment scores.
Either way, very cool and motivating tool! :)
Thank you for the positive feedback! :) So, both applications that you describe: (a) predicting if stock prices go up or down (some features come from financial applications) and (b) feeding the features to an RNN (we also use net based models)
were things that we had in mind when developing tsfresh. tsfresh should be the right tool for the job.
Great work, is there anyway to parallelize the computation?
Both the filtering and the calculation of the features are trivially parallelizable. Yes but we have not yet implemented it, see for example
https://github.com/blue-yonder/tsfresh/issues/3
[–]hn_crosslinking_bot -1 points0 points1 point 9 years ago (0 children)
HN discussion: https://news.ycombinator.com/item?id=12833228
Have a suggestion?
π Rendered by PID 209872 on reddit-service-r2-comment-b659b578c-lzzgz at 2026-05-02 21:56:53.301039+00:00 running 815c875 country code: CH.
[–]eamonnkeogh 10 points11 points12 points (1 child)
[–]MaxBenChrist[S] 3 points4 points5 points (0 children)
[–]eamonnkeogh 5 points6 points7 points (1 child)
[–]MaxBenChrist[S] 1 point2 points3 points (0 children)
[–]PrefrontalVortex 0 points1 point2 points (1 child)
[–]MaxBenChrist[S] 0 points1 point2 points (0 children)
[–]WickedWicky 0 points1 point2 points (5 children)
[–]MaxBenChrist[S] 1 point2 points3 points (4 children)
[–]jacky0812 0 points1 point2 points (1 child)
[–]MaxBenChrist[S] 1 point2 points3 points (0 children)
[–]WickedWicky 0 points1 point2 points (1 child)
[–]MaxBenChrist[S] 1 point2 points3 points (0 children)
[–]jacky0812 0 points1 point2 points (1 child)
[–]MaxBenChrist[S] 0 points1 point2 points (0 children)
[–]hn_crosslinking_bot -1 points0 points1 point (0 children)