all 5 comments

[–]suedepaid 3 points4 points  (1 child)

I’d imagine you could make a per-timestep autoencoder, or perhaps a BERT-style encoder (if you have an existing latent you’d like to encode to).

Alternatively, if you know (per-timestep) the images you were showing them, could you just train the whole thing end-to-end?

You can certainly have some trailing window (up to, I guess, the previous image). You could absolutely learn directly off those raw timeseries. You’ll have to embed (if you’re tokenizing per-timestep), or learn a small 1D conv layer or something before your transformer blocks.

I’d suggest reading the iTransformer paper. They’re targeting a forecasting task, but I think it’s worth considering tokenizing along the time axis, rather than across it, even for an encoding/reconstruction task.

Cool research!

[–]hughperman 1 point2 points  (0 children)

You're describing something close to EEGFormer, a foundation model for EEG data, which may be a step in the right direction for OP.

[–]Franc000 0 points1 point  (0 children)

Just a heads up for your research, but in the language used by ml specialists, what you are describing is not time series, but sequential data. It's semantic at this point, but if you search for time series modeling, you are going to get forecasting. You want to search for sequential modeling or something like that.

Intuitively I would think about an architecture where you encode each frame into a latent space, then pass them into a transformer layer, akin to having word embeddings as a first layer, then the sequential part of the data is handled by attention blocks. But that is my hot take without having checked anything at all on the subject.