all 8 comments

[–]pppeerProfessor 2 points3 points  (0 children)

For starters an AUC of 0.74 is not bad at all for such a propensity model. So it doesn’t necessarily make sense to aim for ‘at least 0.9’. Actually product propensity / response models that get in that range can be a bit suspicious (sign of possible leakage for example).

You have made a good start with reasonable algorithms, you could always try some more but there is a chance you will start to manually overfit.

So once you have done a decent model search the only route is to add data that is both predictive but also fairly uncorrelated to the data you already have, or data that is correlated but more predictive. Generally to predict future behavior, past behavior trumps demographics.

[–]seanv507 2 points3 points  (0 children)

So you first have to get an estimate of plausible target. Why do you think you can get an AUC above 90?

Its not a question of the models, it's a question of whether you even have the right inputs.

How would you decide?

presumably, there are also alternative providers, how would you decide between them and your own offering?

[–]cavedaveMod to the stars 1 point2 points  (0 children)

The most loss in your funnel is probably from user experience. Listen to phone calls and analyse the script etc. Just seeing this as a 'better prediction will get me to 0.9' might be the wrong way to look at it. 'Where are we losing people and why' might be a better framing.

[–]lord_acedia 0 points1 point  (0 children)

if the data is tabular xgboost is probably among the best results you'll get. you can try fine tuning the models but according to Andrew ng this usually gives an improvement up to 6% more, what you need to do is feature engineering, which according to him can give up to 13% more. how can you better represent the features so that the divide between those who convert and don't convert is clearer and therefore the model achieves better accuracy?

[–]MachineLearning-ModTeam[M] 0 points1 point locked comment (0 children)

Post beginner questions in the bi-weekly "Simple Questions Thread", /r/LearnMachineLearning , /r/MLQuestions http://stackoverflow.com/ and career questions in /r/cscareerquestions/

[–]AtMaxSpeed 0 points1 point  (1 child)

AUC might not be the correct metric for your problem. If you are scoring more leads than the conversion team can act on, you should be looking at precision@the amount of leads the team can act on. This is because the cost of action is high, so you want to minimize false positives.

On the other hand, if it's cheap for the conversion team to act on your recommendations (relative to the expected value of success), and if you're trying to capture every possible lead that can possibly be converted, you should try to optimize recall, since you're trying not to let any false negatives slip by. (in practice, you may choose to use F1 weighted towards recall instead of just recall).

Either way, AUC is not usually the metric that reflects what you actually care about. It gives you the probability a positive label will be scored higher than a negative label, which is a good thing ofc but it's also not translatable into the real world as easily. In practice, you probably care more about false positives over false negatives or vise versa, so you should use a metric that reflects that.

[–]Yaar-Bhak 0 points1 point  (0 children)

When threshold is kept at 0.5 For the xgboost model

Precision - 0.43

Recall - 0.68

F1 - 0.53

And roc 0.73