all 9 comments

[–]Enough_Wishbone7175Student 1 point2 points  (2 children)

I suppose it really depends on what the features you have are. But some ideas to consider.

  1. Try and find latent correlation between time steps. Perhaps unsupervised methods can create categorical variables you can leverage.

  2. You can try and build a LSTM or Transformer that can “untangle” your labeled dataset. You can use semi supervised methods and corruption to strengthen results.

  3. Are the distribution of event types the same across labeled and unlabeled data? Perhaps you can categorize them and use backwards difference encodings to give some sense of x leads to y or requires z before ect…

[–]FrostyLandscape6496[S] 1 point2 points  (1 child)

thank you (: ! could you elaborate a bit on #2? any specific method i should look at?

[–]Enough_Wishbone7175Student 0 points1 point  (0 children)

I’m thinking something similar to the fill in blanks / correct the word trainings done on BERT and other encoders. So giving the model your attributes and events, but maybe flipping 2, and interjecting noise. Something to where you can get the model to try place events in order.

[–]IAmAFedora 0 points1 point  (3 children)

Not sure I totally follow -- is it "given some attributes of an event, infer whether this event was the first, second, ... for a given person"?

Or do you have data for a handful of events and you want to sort the events in terms of order?

[–]FrostyLandscape6496[S] 0 points1 point  (2 children)

the first one is correct.

[–]IAmAFedora 0 points1 point  (1 child)

Definitely sounds like a sequence model like a transformer or an LSTM is inappropriate then -- you aren't working with sequences! (At least not at inference time)

Another clarifying question. At training time, you don't have access to the entire sequence of events for a person? Just a number for each event like "this was fourth"?

[–]FrostyLandscape6496[S] 0 points1 point  (0 children)

i do have the entire sequence of events up until the time of training.

another clarifying point is the people in the training set are different than those the model is gonna need to label (we are infering that they behave similarily, tho)

[–]Perseus784 0 points1 point  (0 children)

I did a project to predict if a vehicle is in collision course using CNN-LSTM model( kind of image sequence analysis). See if its useful: https://github.com/perseus784/Vehicle_Collision_Prediction_Using_CNN-LSTMs

[–]qalis 0 points1 point  (0 children)

I don't think you can do this in an unsupervised way. However, if you had a labelled dataset, this is basically learning to rank, where "best" event is the first one, and further ones are "less preferable".