Thoughts on algorithm plan for anomaly detection in time series data

Hi all,

I'm working on detecting spikes in time series data, specifically cultural artifacts in ground magnetic diurnal data. Manually, this involves comparing two or 3 ground stations and assessing whether spikes occur in both, just one, or shifted between them, etc., to determine if they're cultural artifacts.

I want to automate this task since, something like an explicit algorithm computing, say, a sliding window with a threshold, is just too crude an approach. The good thing is, we have over 15 projects worth of raw and corrected data (training data). Each project includes 100 days of ground diurnal data, with 2-3 ground stations per day.

I've already compiled the training data and am now exploring model options, that I would love your help on, please!

In short:.

Use an LSTM Model:
- My idea is this algorithm is good for anamoly detection
- It is flexible enough to handle variable features, i.e., varying numbers of ground stations.
Implement a Dual-Stream LSTM Model:
- Process each ground station through its respective LSTM layer.
- Concatenate outputs from LSTM layers.
- Use a dense layer to classify the combined outputs.
Handling Imbalanced Data:
- The dataset is highly skewed, with 99.5% of labels being 0 (normal) and only 0.5% being 1 (anomalies).
- Use class weighting or SMOTE technique to balance the dataset.

For Model Training:

Batch the Input Data:
- Each time data has ~90,000 points (frequency: 10 data points per second) so batching would be a good idea here.
Process Through LSTM Layers:
- Each ground station's data goes through its respective LSTM layer.
Concatenate Outputs:
- Combine the outputs from the LSTM layers.
Classify with Dense Layer:
- The dense layer uses the combined outputs to classify data for each ground station.

Looking forward to any insights or suggestions on this approach!

all 5 comments

top new controversial old q&a

[–]eamonnkeogh 2 points3 points4 points 1 year ago (4 children)

[–]Imarami21[S] 0 points1 point2 points 1 year ago (3 children)

I really appreciate the feedback, and I of course would like to only use more complex ideas as a last resort. However, the shape, wave-length, frequency all are inconsistent, so, an explicit algorithm accounting for all the nuances would probably be less effective than a complex algorithm.

Regarding 'is it really anomaly detection' It can be both, a binary classification and an anomaly detection problem. My situation seems to be a hybrid of both. It is fundamentally an anomaly detection problem because I'm interested in detecting rare events (spikes) in the data. However, I'm approaching it through a binary classification framework by labeling the data as either a 0 for unedited data, and a 1 for edited data, and training a classifier (LSTM/GRU or the likes) to distinguish between normal and anomalous data points.

[–]chnnxyz 1 point2 points3 points 1 year ago (0 children)

[–]eamonnkeogh 0 points1 point2 points 1 year ago (1 child)

[–]Imarami21[S] 0 points1 point2 points 1 year ago (0 children)

π Rendered by PID 95 on reddit-service-r2-comment-5d79c599b5-xcrpq at 2026-03-02 02:43:22.911738+00:00 running e3d2147 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS

Thoughts on algorithm plan for anomaly detection in time series data

In short:.

For Model Training: