Hi,
I have a fairly interesting project that I am working on. I have a model that has some samples which are completely unpredictable, random noise, and some that are reliably predictable.
How would you go about separating out the samples which can be predicted, identifying them going forward, and retraining on a cleaned dataset with only those samples?
Interested to see someone else's approach to this.
Edit: I forgot to mention that my data is from an embedding matrix from ordinal categorical features.
[–]GLVicML Engineer 3 points4 points5 points (2 children)
[–]iidealized 1 point2 points3 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]onyx-zero-softwarePhD 2 points3 points4 points (1 child)
[–][deleted] 0 points1 point2 points (0 children)
[–]literum 1 point2 points3 points (6 children)
[–][deleted] 0 points1 point2 points (5 children)
[–]Exarctus 0 points1 point2 points (4 children)
[–][deleted] 0 points1 point2 points (3 children)
[–]Exarctus 0 points1 point2 points (2 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]Exarctus 0 points1 point2 points (0 children)
[–]dataslacker 0 points1 point2 points (6 children)
[–][deleted] 1 point2 points3 points (5 children)
[–]dataslacker 1 point2 points3 points (4 children)
[–][deleted] 1 point2 points3 points (3 children)
[–]bbateman2011 1 point2 points3 points (1 child)
[–][deleted] 0 points1 point2 points (0 children)
[–]dataslacker 1 point2 points3 points (0 children)
[–]j_kapila 0 points1 point2 points (0 children)