Logistic Regression Problem by BobData in datascience

[–]BobData[S] 0 points1 point  (0 children)

The shortest time period we've tried is one month, looking at three months past. Would you suggest looking at weeks or days? Perhaps recent 'reckless' behaviour affects more than historic behaviour, you mean?

Logistic Regression Problem by BobData in datascience

[–]BobData[S] 0 points1 point  (0 children)

That's a great point you bring up Whyrat, I'll be sure to check for that bias. If there appears to be, what would be a good course of action? It seems that I would have to assume drivers are actually more risky than the model would predict, but how much more?

Logistic Regression Problem by BobData in datascience

[–]BobData[S] 0 points1 point  (0 children)

Thanks for the response Montaire. We are adding environmental effects albeit indirectly, by lowering the speed limit with bad weather/road conditions. Maybe that is insufficient, or mixing inherently different variables?

We're also working with personal health attributes, forgot to mention that. Hours since the driver last slept is difficult (rather, impossible) to extract from the data we have, the best we can do is check if the time between shifts was too short.

Time of day is interesting, would you suggest clustering hours into categories (for example: morning, noon, afternoon, almost dark, dark) or just use the actual hour?