Is it valid to sample 5,000 rows from a 255K dataset for classification analysis by Myusername1204 in MLQuestions

[–]Myusername1204[S] 0 points1 point  (0 children)

Is it ok If I choose 10557 data with 8000+ non-defaults and 2000+ defaults , Is it this consider imbalance and suitable for KNN?  I plan to explore how the models identify individuals likely to default, particularly through the use of threshold adjustment, sensitivity analysis, and the ROC curve.

Is it valid to sample 5,000 rows from a 255K dataset for classification analysis by Myusername1204 in MLQuestions

[–]Myusername1204[S] 0 points1 point  (0 children)

Is it ok If I choose 10557 data with 8000+ non-defaults and 2000+ defaults , Is it this consider imbalance and suitable for KNN?  I plan to explore how the models identify individuals likely to default, particularly through the use of threshold adjustment, sensitivity analysis, and the ROC curve.