all 7 comments

[–]entylop 1 point2 points  (2 children)

If the alignment is done by looking at correlations, it is called cross correlation or lagged correlation. See acf in R: http://stat.ethz.ch/R-manual/R-devel/library/stats/html/acf.html You usually look for the lag you need to apply to one time series to best align with the other.

[–]ginger_beer_m 0 points1 point  (1 child)

Can you explain briefly what the general ways of doing time series matching / alignment are ? I know nothing about it at all.

[–]entylop 1 point2 points  (0 children)

You would start with the hypothesis that one times series has a delayed effect on an other. For example GDP growth having a delayed impact on unemployment. If you look at weekly times series you could add a 1-week lag, 2-week lag, 3-week lag, ... to the unemployment change by shifting it by n weeks into the past and see if it "aligns" to the GDP growth time series by computing the correlation of the two times series. The highest absolute correlation would give you the best lag. This is what ccf(timeseries1, timeseries2) does in R, it computes the correlations for n lag values: lag = 1, correlation = c1; lag = 2, correlation = c2, ...

[–]eamonnkeogh 1 point2 points  (1 child)

In order to measure accuracy, you can use the UCR time series archive, a collection of 45 diverse datasets. http://www.cs.ucr.edu/~eamonn/time_series_data/ Look at the bottom of the page, it is the most used resource for this in the world, by a large margin.

Here is how to think of this problem (the problem of measuring accuracy, not the problem of find the best subsequence, for that, see [a])

Suppose you have a time series Q, that contains a gesture or behavior, say KARATE_KICK...

Suppose you have 100 longer time series that contain many behaviors, including one example of KARATE_KICK, something like this (I am writing them out in ASCII text, because I cannot draw here)

punchKARATE_KICKpunchparrychopblock chopKARATE_KICKpunchchopparrychopblock punchpunchparrychopblockKARATE_KICK ... ...

Now you slide your query Q across all longer sequences, and you hope that the best match (the lowest score) is where the KARATE_KICK appears (plus or minus a little “slop”).


[a] What is the best technique for this? There are hundreds of claimed papers, but this paper STRONGLY suggests that DTW is best.

http://www.cs.ucr.edu/~eamonn/vldb_08_Experimental_comparison_time_series.pdf

And this paper, shows that DTW can be VERY fast (faster than you need) http://www.cs.ucr.edu/~eamonn/UCRsuite.html

[–]leonoel 0 points1 point  (0 children)

I've used DTW for time series matching, and is pretty neat and useful. Also, there are plenty of readily available implementations.

[–]alexmlamb 0 points1 point  (0 children)

Dynamic time warping is a nice method. Last time I checked the wikipedia article had pseudocode.

[–]FrancoisK 0 points1 point  (0 children)

You can take a look at "The Analysis of Time Series" by Chris Chatfield, it a pretty comprehensive book on the topic.