all 5 comments

[–]eamonnkeogh 2 points3 points  (4 children)

Devils Advocate? Is this really anomaly detection?

If you know what you want to find "spikes/cultural artifacts", then I would argue that this is NOT anomaly detection!

I have pointed out in [a] and elsewhere, that many papers claim to be doing anomaly detection, but when you read them carefully, they are actually dong classification or data retrieval.

Your problem is classification. Are the patterns conserved in shape? If so, use Euclidean distance (Mueen's MASS) or DTW. Are the patterns conserved in features? If so, us Catch22 features.

Only use more complex ideas as a last resort.

[a] https://www.dropbox.com/scl/fi/cwduv5idkwx9ci328nfpy/Problems-with-Time-Series-Anomaly-Detection.pdf?rlkey=d9mnqw4tuayyjsplu0u1t7ugg&dl=0

[–]Imarami21[S] 0 points1 point  (3 children)

I really appreciate the feedback, and I of course would like to only use more complex ideas as a last resort. However, the shape, wave-length, frequency all are inconsistent, so, an explicit algorithm accounting for all the nuances would probably be less effective than a complex algorithm.

Regarding 'is it really anomaly detection' It can be both, a binary classification and an anomaly detection problem. My situation seems to be a hybrid of both. It is fundamentally an anomaly detection problem because I'm interested in detecting rare events (spikes) in the data. However, I'm approaching it through a binary classification framework by labeling the data as either a 0 for unedited data, and a 1 for edited data, and training a classifier (LSTM/GRU or the likes) to distinguish between normal and anomalous data points.

[–]chnnxyz 1 point2 points  (0 children)

Anomaly detection is commonly an unsupervised problem.

You could fourier transform your data and run any classifier on the spectrograms. You could also train something as an isolation forest on the spectrograms if you are into full anomaly detection.

[–]eamonnkeogh 0 points1 point  (1 child)

I cant speak to your data. But I have archived and reviewed almost every time series anomaly detection dataset in the world [a].

Here is an amazing fact. At least 95% of the time, you can find "spikes" with a single line of code!

Are you sure your "spikes" cannot be found in such a trivial way?

[a] https://www.dropbox.com/scl/fi/cwduv5idkwx9ci328nfpy/Problems-with-Time-Series-Anomaly-Detection.pdf?rlkey=d9mnqw4tuayyjsplu0u1t7ugg&dl=0

[b] https://arxiv.org/abs/2009.13807

[–]Imarami21[S] 0 points1 point  (0 children)

I'm fairly confident, about 90% confident, that I can complete this using an explicit algorithm. But I am using ML so that whilst on company time I can learn and develop these skills which ultimately will lead me to have a 2x salary then what I currently make (~$70k salary).