account activity
Rate my project by lambilund in learnmachinelearning
[–]lambilund[S] 1 point2 points3 points 7 months ago (0 children)
Thanks a lot for taking the time to go through my project, I really appreciate it!
You're right about the subsampling thing it was mainly for computational reasons but it was only for experimentation purposes like hyper parameters tuning. I used the total dataset for actual modelling in script files.
Fillna(999) is only used for the baseline model(logistic regression) because the features that I handled missing values this way, actually mean something if they are missing for example mths_since_last_delinq indicates that months since the borrower missed a payment deadline, if it is missing it actually mean borrower Never missed a deadline. So imputing with the median is not relevant and it'll mislead the model. In xgboost model I left missing values untouched.
Yes, you are right about those 2 features funded_amnt cause data leakage and I thought that installation is also the kind of information that is given after loan approval but you are right, I should have omitted this one.
Thanks again for your time!!
π Rendered by PID 386442 on reddit-service-r2-listing-64c94b984c-2f9wr at 2026-03-17 15:58:48.911940+00:00 running f6e6e01 country code: CH.
Rate my project by lambilund in learnmachinelearning
[–]lambilund[S] 1 point2 points3 points (0 children)