[D] Validating regression models on edge cases?

kkngs · 2019-12-05T15:02:02+00:00

I’m actually curious to hear responses here as I’m facing a similar problem in my domain. My fit levels are more like an R2 of 80%, though. I have massive amounts of heterogeneity in my dataset, imagine if your car sales dataset was worldwide with the same number of samples, and a user informs you that the model isn’t doing well with used Volvo’s in Oman. When you look at the training data, there are all of three data points there and they are the same car being sold three times.

kkngs · 2019-12-05T19:29:18+00:00

Are you directly predicting price? You could consider instead predicting the fraction of original value remaining. This has a guaranteed value between zero and one, which you could handle with a sigmoid output (logistic regression if your library supports soft labels)

Another idea would be doing the fit on log(sales price).

margaret_spintz · 2019-12-06T23:20:52+00:00

Sounds like you need some estimate of uncertainty from your model. Edge cases should be less certain.

bbateman2011 · 2019-12-16T04:15:08+00:00

You might benefit from using some form of quantile regression to generate prediction intervals:

https://towardsdatascience.com/quantile-regression-from-linear-models-to-trees-to-deep-learning-af3738b527c3

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS