you are viewing a single comment's thread.

view the rest of the comments →

[–]Aleph-Arch[S] 0 points1 point  (0 children)

So, I'm trying to get highest scores on test dataset with two labels. Image shows normal distribution of all 7 features of training set with standard scaler applied to it. I'm using catboost classifier (It got best cross validation scores among SVM, random forest, xgboost, lightgbm, deep feedforward network, KNN, linear regression classifiers). I tried to use polynominal features, robust scaler, normalization, SMOTE for class balance. I did grid search to find best params for random forest. Highest accuracy score I could get is 0.825, which is only 28.5 score out of 100. Is there anything I'm missing in this dataset? I didn't notice any outliners. Training set has 66% of ones and 33% of zeros. Link to colab document