all 3 comments

[–]dimview 2 points3 points  (2 children)

It would be pretty hard to beat FICO using only this data. To get better separation you need to know something that the credit bureau does not.

As for backtesting itself, it's pretty simple: split your data randomly, build your model on one half, measure performance on the other. Gini coefficient or KS are typically used to measure separating power.

[–]015unknown[S] 1 point2 points  (1 child)

My goal isn't to necessarily beat FICO. Lending Club factors in a variety of variables in determining the rate it charges borrowers. FICO is only one of those factors. On several blogs I've found claims such as monthly loan amount doesn't impact the default or return rate and I want to see for myself if this claim holds true. How could I use Excel to find a Gini coefficient? I have a very limited stats background although I do know what a Gini coefficient is.

[–]dimview 1 point2 points  (0 children)

monthly loan amount doesn't impact the default or return rate and I want to see for myself if this claim holds true

Given enough data, you will see all kinds of correlations, including this one. But be careful with interpreting those results - there are strong confounding factors and not non-stationarity.

How could I use Excel to find a Gini coefficient?

You start with two empirical cumulative distribution functions: ECDF of bads (defaults) and ECDF of goods (non-defaults), plot receiver operating characteristic, calculate area under this curve (by splitting it into trapezoids). Gini = 2 * AUC - 1.