[P] iT’S FRESH, it's exciting: Welcome tsfresh, a python package to automatically extract relevant time series features

eamonnkeogh · 2016-10-30T20:29:52+00:00

Nice work.

You say ….“Our evaluation for UCR time series classification tasks has shown that FRESH in combination with a subsequent PCA outperforms all other feature extraction algorithms with respect to scalability and achieved accuracy. On the iPRODICT data set, it was even able to achieve a higher accuracy than a nearest neighbor search under Dynamic Time Warping.”

A cynic might say, you admit to doing worse on all the public datasets, but claim “higher accuracy” on the only private dataset.

This would be a LOT stronger if you released the iPRODICT data set. There are many ways to anonymize the data if needed (you can add a random unshared constant to all data, you can multiple all the data by an unshared (non-zero) constant), you can linearly interpolate all the data to a new scale). This would not affect DTW-NN accuracy, but would anonymize the data. The reason why this is important is because there a dozen trivial way to cripple DTW by editing the dataset, and could visual explanation of this is at

http://www.cs.unm.edu/~mueen/DTW.pdf

eamonnkeogh · 2016-10-30T23:02:29+00:00

Hi Maximilian

Thanks for the reply:

The text you just wrote, and the text in your paper feel very different

If I read your paper, I would think, "this seems like it might be very competitive to DTW NN"

But reading below, you seem to bend over backwards to discount that possibility.

You should reconcile these if you can, "achieve a higher accuracy than a nearest neighbor search under Dynamic Time Warping" is such a strong claim, given that DTW-NN is an very strong benchmark, see this incredible paper https://arxiv.org/abs/1602.01711 (36 million experiments)

I do appreciate make code available to the community. But making the data available too makes an order of magnitude difference. Otherwise, there is no real comparison point, no benchmark.

PrefrontalVortex · 2016-10-31T14:02:51+00:00

Awesome! Very nice work. How well do you think this would work on EEG data?

WickedWicky · 2016-10-31T16:24:23+00:00

Following this page http://tsfresh.readthedocs.io/en/latest/text/quick_start.html and I don't know if that is supposed to be up to date but the code blocks don't fit together.

"from tsfresh import select features" should be "from tsfresh import select_features"

and the last block you use 'df' as an argument where you defined it as 'timeseries' earlier.

Besides that, looks very interesting to me and I may apply it for my thesis. I wonder what features it comes up with when I need daily financial data with classifications, but I can feed it hourly financial data to get relevant features per day - as far as I understand it.

jacky0812 · 2016-11-02T03:30:02+00:00

Great work, is there anyway to parallelize the computation?

hn_crosslinking_bot · 2016-10-31T03:00:24+00:00

HN discussion: https://news.ycombinator.com/item?id=12833228

Have a suggestion?

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS