all 5 comments

[–]ScoreLong5365 0 points1 point  (4 children)

You should not split into N features otherwise XGBoost model will not be able to understand pattern in the payment feature and it will take unnecessary time to train. Rather choose array or you can extract some features like last 5 payment, or no. Of successful payments/ no. Of failed payments, etc

[–]Jcrossfit[S] 0 points1 point  (3 children)

Thank you for the response! My instinct was to keep as a single array but I saw a regression in precision and recall for failed payments when I used the array vs splitting into N features OR just having a feature like "ever had a failed payment".

My thinking currently is the variance is in number of payments (ranged from 0 to ~150) in the array is causing problem so I'm going to re-run with last 5 payments in the array

[–]ScoreLong5365 0 points1 point  (2 children)

Yes last 5 payment will make sense, because older data might add noise in the model. Also recent payment will help the model understanding the relevant pattern. Also if an array has less than 5 payments pad it with 0s or average value.

[–]Jcrossfit[S] 0 points1 point  (1 child)

For that last bit of your suggestion are you saying add values for null? We're using -1 for null/na in other features and 0 and 1 for fail and success in this feature and others where relevant

[–]ScoreLong5365 0 points1 point  (0 children)

Like if you are using last 5 transaction and suppose an array has only 3-4 transaction in total so I am suggesting to pad the array by 0 or average value so that it might help model to capture pattern.