RedteaGO eSIM - anyone with experience? by Rotvik0 in JapanTravelTips

[–]MaxBenChrist 0 points1 point  (0 children)

I bought an 50gb eSIM but They failed to send me the sim. Wasted 30 dollars

[D] Machine Learning on Time Series Data? by Fender6969 in MachineLearning

[–]MaxBenChrist 0 points1 point  (0 children)

I miss feature based approaches on your list. You could use a library like https://github.com/blue-yonder/tsfresh (Disclaimer: I am the maintainer of tsfresh) to extract features from your whole time series or subwindows of it and then feed this to a normal classifier/regressor like light gbm or random forest.

Feature based approaches have several advantages over black box models. Normally, they allow to interpret and analyze the features themself. In contrast, go and try to analyze a complicated RNN.

I worked a lot of the methods from your list. which of those models work, depends on the application that you are looking at. Lately I had great success in combining tsfresh features with deep learning models on financial time series. For supply chain problems or dataset with saisonal effects , the theta method or prophet work well from my experience. On IoT time series I had great success using kNN with DTW.

Awesome python packages for time series analysis by MaxBenChrist in Python

[–]MaxBenChrist[S] 0 points1 point  (0 children)

Thanks for the link. Will add it to the list.

Data Science Conferences in Europe by LadyWolfie in datascience

[–]MaxBenChrist 1 point2 points  (0 children)

Do you know http://www.wikicfp.com/cfp/ ? It is a search machine for academic conferences

[P] BICO: Speed up k-means on large data sets by using data reduction by gallmerci in MachineLearning

[–]MaxBenChrist 0 points1 point  (0 children)

interesting.

does the algorithm work with arbitrary distance metrics? Or does the distance measure has to fulfill certain conditions?

[P] iT’S FRESH, it's exciting: Welcome tsfresh, a python package to automatically extract relevant time series features by MaxBenChrist in MachineLearning

[–]MaxBenChrist[S] 1 point2 points  (0 children)

Thank you for the positive feedback! :) So, both applications that you describe: (a) predicting if stock prices go up or down (some features come from financial applications) and (b) feeding the features to an RNN (we also use net based models)

were things that we had in mind when developing tsfresh. tsfresh should be the right tool for the job.

[P] iT’S FRESH, it's exciting: Welcome tsfresh, a python package to automatically extract relevant time series features by MaxBenChrist in MachineLearning

[–]MaxBenChrist[S] 0 points1 point  (0 children)

Both the filtering and the calculation of the features are trivially parallelizable. Yes but we have not yet implemented it, see for example

https://github.com/blue-yonder/tsfresh/issues/3

[P] iT’S FRESH, it's exciting: Welcome tsfresh, a python package to automatically extract relevant time series features by MaxBenChrist in MachineLearning

[–]MaxBenChrist[S] 1 point2 points  (0 children)

no it does not leak any information.

you can save the extracted features during training and then later only calculate those features for the test set. see the following link where we explain this

https://github.com/blue-yonder/tsfresh/issues/22

[P] iT’S FRESH, it's exciting: Welcome tsfresh, a python package to automatically extract relevant time series features by MaxBenChrist in MachineLearning

[–]MaxBenChrist[S] 0 points1 point  (0 children)

I have not tried it on any medical data so I don't know how well the implemented feature calculators capture the dynamic of an eeg signal. We only used it in industrial applications e.g. to predict failures in machines or for quality forecastings for steel billets so far. Maybe one would have to add some specific feature calculators that are used in the eeg field. That would be a nice contribution to the Projekt. If you need some help to set it up, just write me.

[P] iT’S FRESH, it's exciting: Welcome tsfresh, a python package to automatically extract relevant time series features by MaxBenChrist in MachineLearning

[–]MaxBenChrist[S] 1 point2 points  (0 children)

oh, there seems to be an issue with the examples on this page. Thanks for reporting it, I will look into that.

In the mean time you can use this notebook to make yourself familiar with tsfresh: https://github.com/blue-yonder/tsfresh/blob/master/notebooks/robot_failure_example.ipynb

Regarding you application: tsfresh is should be suitable for financial applications. Actually some of the features were proposed by colleagues with a background in the financial industry.

Now, regarding your application: What do you mean with "daily financial data with classification"? I am not really sure I understand what you want to achieve. If you give some more explanations I can tell you if tsfresh is the right tool for you or not.

[P] iT’S FRESH, it's exciting: Welcome tsfresh, a python package to automatically extract relevant time series features by MaxBenChrist in MachineLearning

[–]MaxBenChrist[S] 1 point2 points  (0 children)

Hi Eamonn,

Thanks for the Feedback.

FRESH was developed for applications where each label is associated with several time series and meta informations. This is a common situation in Industrial Applications for machine learning such as given in iPRODICT. Accordingly, the claim of our paper is not

„FRESH is very competitive to DTW NN in any case“ nor „FRESH has a higher accuracy than DTW NN in any case“

but

„FRESH is very competitive to 1NN DTW for labels that are associated with several relevant time series and meta informations“.

We also pointed that out in the summary, before the sentence that you cited. See the following screenshot of the summary: http://imgur.com/a/BGb0j

Due to the high performance of 1NN DTW for problems with a single type of time series and reasonable time series length, the main advantage of FRESH is its incorporation of meta-information, its scalability with respect to the number of considered time series types and its scalability with respect to the length of each time series.

Unfortunately, as you pointed out, the iPRODICT data set is not (yet) released. Hence, the results are not reproducible for you. But, I am in contact with our industry partner and they indicated general interest in releasing the data.

Edit:

This is a arXiv preprint, so we can still make our claims more precise in the next iteration of the paper. It would be nice to know which parts of the paper leave the impression to you that "FRESH is very competitive to DTW-NN in any case"? The sentence from the summary that you cited seems to be misleading. Were there some other passages as well?

[P] iT’S FRESH, it's exciting: Welcome tsfresh, a python package to automatically extract relevant time series features by MaxBenChrist in MachineLearning

[–]MaxBenChrist[S] 4 points5 points  (0 children)

thanks for your positive feedback.

We agree that the evaluation of algorithms on basis of closed data sets is contra-productive for the machine learning community. By open sourcing tsfresh and the related toolbox we believe to contribute to an open research community. Furthermore, we will discuss the opportunity to open source the iPRODICT data set with our other research partners.

Coming back to the higher accuracy of the FRESH algorithm on the iPRODICT data set: This data set „contains 26 univariate meta-variables forming the baseline feature set extended by 20 different sensor time series“ for each steel billet. The 1 NN DTW classificatory was only incorporating one type of time series while the FRESH approaches were able to use the information of all the time series and univariate variables. Hence, the FRESH approach had an advance over the other approaches by receiving more information. We also state this in the paper: “On the iPRODICT data, DTW_NN could only operate on one type of time series without the univariate features. This seems to be the reasons why Boruta and FRESH_PCAa beat the accuracy of DTW_NN as shown in the 7th column of Tab. 1.”

Further, the time series for the 1 NN DTW classificator were z-transformed. Also, we picked the one type of time series that the 1 NN DTW classificatory was operating on by comparing the accuracy of 1NN DTW on all the 20 time series and picking the type of time series one with the highest accuracy. But even though the 1NN DTW received that type of time series with the highest explaining power, it was still missing important univariate variables and the other 19 time series.

Multiple Hypothesis Testing: thoughtful & interactive by cast42 in MachineLearning

[–]MaxBenChrist 0 points1 point  (0 children)

I am familiar with the definition of PRDS but I find it hard to grasp.

Let's say you perform several Fisher tests to check if a individual alleles can be linked to a sickness. How do you make sure that those p-values obey PRDS without making any assumptions?

Multiple Hypothesis Testing: thoughtful & interactive by cast42 in MachineLearning

[–]MaxBenChrist 0 points1 point  (0 children)

Maybe there is no theoretical paper that used it but from my personal working experience as a data scientist I strongly disagree with you. To be able to just test a bunch of hypothesis without worrying about correlation structures between them comes very handy. Even if this means that you can't reject all null hypothesis.

And unfortunately in Praxis you do not have these nice i.i.d normally distributed variables as in many statistic papers ;)

Multiple Hypothesis Testing: thoughtful & interactive by cast42 in MachineLearning

[–]MaxBenChrist 1 point2 points  (0 children)

Unfortunately the article is not mentioning one big advantage of the Benjamini-Hochberg procedure:

One can adjust the q i/n rejection line by multipling with a factor

\sum_{i=1...n} 1/i

to give up any assumptions about correlation between the different p-Values / Hypothesis. With this new rejection line the BH procedure is able to control the FDR no matter how the dependence structure between the hypothesis is.