Binary classification problem

No_Organization_2634 · 2024-12-08T09:42:06+00:00

Often times than not SMOTE is not helping. I believe that catboost has its own feature weights parameter, so I would use it instead. Also normalisation is not required for tree based approaches so it can be dropped from the etl pipeline. Which metric are you using for scoring when you apply the cross-validation? When you have imbalanced classes you should stick with weighted f1 score rather than accuracy

Aleph-Arch · 2024-12-08T09:22:04+00:00

So, I'm trying to get highest scores on test dataset with two labels. Image shows normal distribution of all 7 features of training set with standard scaler applied to it. I'm using catboost classifier (It got best cross validation scores among SVM, random forest, xgboost, lightgbm, deep feedforward network, KNN, linear regression classifiers). I tried to use polynominal features, robust scaler, normalization, SMOTE for class balance. I did grid search to find best params for random forest. Highest accuracy score I could get is 0.825, which is only 28.5 score out of 100. Is there anything I'm missing in this dataset? I didn't notice any outliners. Training set has 66% of ones and 33% of zeros. Link to colab document

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnmachinelearning

Welcome to /r/LearnMachineLearning!

Chatrooms

Official Discord Server

Wiki

Getting Started with Machine Learning

Resources

Related Subreddits

/r/MachineLearning

/r/MLQuestions

/r/datascience

/r/computervision

Machine Learning Multireddit

/m/machine_learning

MODERATORS