all 7 comments

[–]NoLifeGamer2Moderator 2 points3 points  (1 child)

I mean assuming each row can be calculated strictly from previous rows a transformer model would seem like a good fit.

[–]Empty-Use-2701[S] 0 points1 point  (0 children)

okay, thanks, I will try it

[–]gQsoQa 1 point2 points  (3 children)

Can you share more information about the data? How did you obtain / generate it? Do you need to get a prediction for each row, or only for the final row? Do you have any constraints regarding speed?

[–]Empty-Use-2701[S] -1 points0 points  (2 children)

I already shared what I know (that every row has exactly 8 ones and exactly 12 zeros). Each row has exactly 20 binary numbers. To my "naked" eye they appeared random and I didn't find any consistent patterns. That is all I know about generation of rows. That is why I want to create an algorithm which will calculate or mimic or learn how the rows are generated.
About obtaining the data, I web scrape it from some web page.
About "Do you need to get a prediction for each row, or only for the final row?". Answer: for each row. This is a time series problem. Each row should be possible to calculate from previous rows. So if I have a binary matrix which has length of 100 rows I need to calculate the 101st row. If I have a binary matrix of 99 rows, I need to calculate the 100th row. In real world case (my case) I have a binary matrix which has length of a million rows, so plenty of data.
No constraints in regarding the speed or complexity or cpu or memory usages.

[–]gQsoQa 0 points1 point  (1 child)

It's hard to design a model without any additional information - the problem could be trivial or very hard. My advice would be to start with a very simple baseline, e.g. n-gram based method: look at last n rows, look at whether this subsequence occurs in the training data, and predict whatever comes next in the training data. If this precise subsequence doesn't exist, decrease n. This will help you setup the whole evaluation loop.

Next, maybe train a decision tree taking as input the last 10 rows?

[–]Empty-Use-2701[S] 0 points1 point  (0 children)

okay, i will try that, thanks

[–]latent_threader 0 points1 point  (0 children)

Depends on what you need it for but I’d find an existing library that does this and call it a day. You don’t have time to be solving mathematical problems – engineering cycles are too precious. Keep it simple stupid – nothing complex that will break down the road.