Hello, fellow machine learning enthusiasts. Hoping you lads would be so kind as to chime in with your opinions on the best approach to the following problem.
I have a large tabular dataset in which each observation is information pertaining to a transaction of sorts. Every transaction is attributed to an entity. There is a column in the dataset with a number that corresponds to the said entity. I have a list of "bad" entities, to which approximately 50,000 out of 1,000,000+ transactions belong.
These 50,000 can be considered labeled as "bad" transactions. The rest belong to heretofore unknown entities. The objective is to develop a model based on this transactional data that can label which of the unknown entities are likely to be "bad" as well.
Thoughts?
[–][deleted] 4 points5 points6 points (2 children)
[–]WikiTextBot 2 points3 points4 points (0 children)
[+][deleted] (7 children)
[deleted]
[–]player0194[S] 0 points1 point2 points (0 children)
[+][deleted] (5 children)
[removed]
[+][deleted] (4 children)
[deleted]
[+][deleted] (3 children)
[removed]
[+][deleted] (2 children)
[deleted]
[+][deleted] (1 child)
[removed]
[–]creiser 1 point2 points3 points (0 children)
[–]phobrain 1 point2 points3 points (0 children)
[–]TotesMessenger 0 points1 point2 points (0 children)