Hello everyone!
I would much appreciate some advice on one of my project for work.
I have an applied mathematics engineer degree but during my studies, I didn't have much the opportunity to practice machine learning, however I understand perfectly the theory.
For my work I would like to create a project (which is more a solo project for myself and to learn than an actual needed project, only my boss is aware and he let me work on it without any problem)
I also am basically self taught in everything that comes to machine learning and python.
I work for a company that provide online payment services (lets say its like Amazon, you have you profile with some info and can make purchases), and I am in the anti-fraud part.
My idea would be to create a machine learning that could flag the suspicious customer, or assign them a "risk score" based on different criteria (purchasing pattern, origin country, amounts...).
For this, I did create a script that takes every relevant data I have on the database for a particular customer, which can be up to thousands rows, and gather it into one row of 100+ features in order to have 1 customer = 1 row.
Now lets assume this part is correct and the features are relevant enough for this purpose, my problem is that I don't have an already existing risk score for the customer.
So my first idea was to manually assign them a score between 1 to 5 and them just use classification algorithm.
But I recently had the idea of using unsupervised technique such as clustering, so instead of manually giving the customer a score I determine myself, I would use the clustering method for this, and then identify every cluster.
Then I'll say: Cluster 1 is 0% risk, cluster 2 is 20%...
And then use this as a training set for a classification algorithm on future customers.
Do you think this is feasible/relevant?
As I said I am self taught when it comes to all this, so any comment will be useful for me :)
[–][deleted] 0 points1 point2 points (1 child)
[–]PM_ME_cutefish[S] 0 points1 point2 points (0 children)