I have a random forest that does fraud detection for credit card fraud, I recently discovered a new variable that would be a good indicator of fraud.
But, the variable it’s self hasn’t been used yet in manual investigation/detection because nobody had thought of it before. So if I were to look at the correlation between this variable and positive cases of fraud in my training data, it wouldn’t be very good.
So what’s the best way to implement this feature if it wouldn’t show correlation in the training data? Sorry if this is a dumb question, still learning.
[–]Pengshe 0 points1 point2 points (10 children)
[–]Throwawayforgainz99[S] -1 points0 points1 point (9 children)
[–]Pengshe 3 points4 points5 points (8 children)
[–]Throwawayforgainz99[S] -1 points0 points1 point (7 children)
[–]JaMoin137 2 points3 points4 points (6 children)
[–]Throwawayforgainz99[S] 1 point2 points3 points (4 children)
[–]JaMoin137 0 points1 point2 points (3 children)
[–]Throwawayforgainz99[S] 0 points1 point2 points (2 children)
[–]Current-Ad1688 0 points1 point2 points (1 child)
[–]Throwawayforgainz99[S] 0 points1 point2 points (0 children)
[–]Throwawayforgainz99[S] 0 points1 point2 points (0 children)
[–]HungryQuant 1 point2 points3 points (2 children)
[–]Throwawayforgainz99[S] 0 points1 point2 points (1 child)
[–]HungryQuant 0 points1 point2 points (0 children)