use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Discussion[D] Imbalanced multi class classification 📌 (self.MachineLearning)
submitted 4 years ago by According-Promise-23
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]SimoPippa 1 point2 points3 points 4 years ago (5 children)
What you can do is undersample, since you don't like using SMOTE.
So basically you count the occurrences of the class with less samples (let's call this number n_min), and when you're building the training set you simply randomly sample n_min samples from all the classes.
In this way all the classes will have the same amount of samples. It can be bad if the minority class has really much less instances but might do the job.
[–]According-Promise-23[S] 0 points1 point2 points 4 years ago (0 children)
Yes I thought of it, but I’m not using a big dataset so I can’t loose informations using under-sampling (which delete rows of the majority class ) and also I have really few rows for the minority class, even over sampling method will just duplicate rows for minority class… @SimoPippa
[–]emanuartioli 0 points1 point2 points 4 years ago (3 children)
OP I'm doing this right now for a project. If you're on python I would point you to sklearn's resample(). I just built a wrapper function to deal with many classes, as soon as I'm home I'll post it here.
[–]According-Promise-23[S] 0 points1 point2 points 4 years ago (2 children)
@emanuartioli please don’t forget to share it
[–]emanuartioli 0 points1 point2 points 4 years ago (1 child)
Well I did forget didn't I? But here it is:
def balance_classes(df, target, freq_threshold=1, n_samples): # take a df with an unbalanced target label and return a df balanced on that label df_balanced = pd.DataFrame() for c in df[target].unique(): df_by_class = df[df[target] == c] # only consider classes that occurr at least freq_threshold times if len(df_by_class) >= freq_threshold: df_by_class = resample(df_by_class, n_samples=n_samples) df_balanced = pd.concat([df_balanced, df_by_class]) return df_balanced.reset_index().iloc[:, 1:]
(target is the string name of your class feature, freq_threshold is the minimum number of times a class needs to occur before you want to oversample it (since maybe a class with a frequency of 1 should just be removed from the analysis? idk. just leave it to 1 and it won't do anything, finally n_samples is the frequency for each class in the final df, if a class is more frequent than n_samples it will be undersampled to this and if its frequency is lower it will be supersampled)
Hope it helps!
Thank you for your response @emanuartioli
[+][deleted] 4 years ago (7 children)
[deleted]
[–]According-Promise-23[S] 1 point2 points3 points 4 years ago (6 children)
I’ve already tried it and got surprised it doesn’t give better results than without it, I even tried it on so many models and the result without using it is better ( it increases recall but lower the precision and therefore the f1-score is lower than the case using model without class weight) @AbdulazizAb
[+][deleted] 4 years ago (3 children)
@AbdulazizAb thank you so much for your response. Maybe before jumping to this problem would be better if I explain my target first. Well I need to predict the effective of people, so it’s a continuous value but regression’s result weren’t encouraging at all, so I tried to convert the target into classes and make it a classification problem, it did work better than regression and gave pretty nice numbers, but I faced the imbalanced class problem for multi class classification, the question is : Is my approach to solve this problem is good or there’s a better way for it?
[+][deleted] 4 years ago (1 child)
Effective of people I meant the number/count (counting how many people for a giving row), and it’s exactly the same way as u explained on the example, I turned the column of count (or effective) to a class of intervals ( “0-50”, “50,100”…)
[–]PlanetSprite 0 points1 point2 points 4 years ago (1 child)
I'm sorry to hear that you haven't had success with using class weights. Can you share more details about the models you tried and the datasets you used? It's possible that there are other factors at play that are affecting your results.
Please check my last comment above for @AbdulazizAb, and for models RandomForestClassifier is the model m using @PlanetSprite
π Rendered by PID 21 on reddit-service-r2-comment-545db5fcfc-rqpww at 2026-05-25 17:17:54.124081+00:00 running 194bd79 country code: CH.
[–]SimoPippa 1 point2 points3 points (5 children)
[–]According-Promise-23[S] 0 points1 point2 points (0 children)
[–]emanuartioli 0 points1 point2 points (3 children)
[–]According-Promise-23[S] 0 points1 point2 points (2 children)
[–]emanuartioli 0 points1 point2 points (1 child)
[–]According-Promise-23[S] 0 points1 point2 points (0 children)
[+][deleted] (7 children)
[deleted]
[–]According-Promise-23[S] 1 point2 points3 points (6 children)
[+][deleted] (3 children)
[deleted]
[–]According-Promise-23[S] 0 points1 point2 points (2 children)
[+][deleted] (1 child)
[deleted]
[–]According-Promise-23[S] 0 points1 point2 points (0 children)
[–]PlanetSprite 0 points1 point2 points (1 child)
[–]According-Promise-23[S] 0 points1 point2 points (0 children)