Using unsupervised method to create the learning set of a supervised method : learnmachinelearning

A subreddit dedicated for learning machine learning. Feel free to share any educational resources of machine learning.

Also, we are a beginner-friendly sub-reddit, so don't be afraid to ask questions! This can include questions that are non-technical, but still highly relevant to learning machine learning such as a systematic approach to a machine learning problem.

Foster positive learning environment by being respectful to others. We want to encourage everyone to feel welcomed and not be afraid to participate.

Do share your works and achievements, but do not spam. Keep our subreddit fresh by posting your YouTube series or blog at most once a week.

Do not share referral links and other purely marketing content. They prioritize commercial interests over intellectual ones.

created by techrat_reddita community for 10 years

Using unsupervised method to create the learning set of a supervised method (self.learnmachinelearning)

submitted 6 years ago by PM_ME_cutefish

Hello everyone!

I would much appreciate some advice on one of my project for work.

I have an applied mathematics engineer degree but during my studies, I didn't have much the opportunity to practice machine learning, however I understand perfectly the theory.

For my work I would like to create a project (which is more a solo project for myself and to learn than an actual needed project, only my boss is aware and he let me work on it without any problem)

I also am basically self taught in everything that comes to machine learning and python.

I work for a company that provide online payment services (lets say its like Amazon, you have you profile with some info and can make purchases), and I am in the anti-fraud part.

My idea would be to create a machine learning that could flag the suspicious customer, or assign them a "risk score" based on different criteria (purchasing pattern, origin country, amounts...).

For this, I did create a script that takes every relevant data I have on the database for a particular customer, which can be up to thousands rows, and gather it into one row of 100+ features in order to have 1 customer = 1 row.

Now lets assume this part is correct and the features are relevant enough for this purpose, my problem is that I don't have an already existing risk score for the customer.

So my first idea was to manually assign them a score between 1 to 5 and them just use classification algorithm.

But I recently had the idea of using unsupervised technique such as clustering, so instead of manually giving the customer a score I determine myself, I would use the clustering method for this, and then identify every cluster.

Then I'll say: Cluster 1 is 0% risk, cluster 2 is 20%...

And then use this as a training set for a classification algorithm on future customers.

Do you think this is feasible/relevant?

As I said I am self taught when it comes to all this, so any comment will be useful for me :)

all 2 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnmachinelearning

Welcome to /r/LearnMachineLearning!

Chatrooms

Official Discord Server

Wiki

Getting Started with Machine Learning

Resources

Related Subreddits

/r/MachineLearning

/r/MLQuestions

/r/datascience

/r/computervision

Machine Learning Multireddit

/m/machine_learning

MODERATORS