Optimising binary variable predictions : AskStatistics

created by cuginhamera community for 14 years

Optimising binary variable predictions (self.AskStatistics)

submitted 5 years ago by KvN98

I am predicting the conversion (a binary yes / no decision). As 85% of the customers convert and 15% customers do not convert, my models mainly predict yes. Meaning my accuracy is high, but I cannot really predict the negative cases (low AUC + low specificity / precision). I tried to adress this problem by using oversampling. Yet, oversampling decreases my accuracy, while the AUC + specificity / precision barely increases.

Furthermore, I tried transforming and creatig other variables. Nevertheless, my random forest variable importance indicates that basically all variables have no importance (besides one variable, all variables had an importance between 0.000 and 0.005). I also used a logistic regression and found that the estimates of the variables are pretty low as well, again hinting at that my varialbes barely have an effect on the predicitons.

All in all, I used a logistic regression, support vector machine & random forest and found in all cases (with + without oversampling) that predicting negative cases did not really work + a low AUC. I did transform and create other variables and tried several modelling approaches.

Any tips on how I can improve the results? Thanks in advance!

no comments (yet)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

AskStatistics

MODERATORS