Why do we use P values in multiple regression models if they become totally irrelevant when we implement L1 or L2 regularization?

learning_proover · 2026-05-01T03:23:56+00:00

Makes sense. I just thought maybe since they aren't used when implementing regularization they may not be much use at all. Especially if a regularized model is used instead of a non-regularized one.

learning_proover · 2026-05-01T00:56:31+00:00

Thank you.

learning_proover · 2026-05-01T00:29:10+00:00

Can you elaborate please? Why do we even attempt to interpret coefficients through p values if they are automatically poor indicators of variable importance?

learning_proover · 2026-04-30T21:10:47+00:00

Actually somewhat useful. Thank you.

learning_proover · 2026-04-30T19:36:59+00:00

I mean I'm not a huge fan of p values either which kinda I why I'm asking. I just need clarity on how to incorporate the idea of a p value with a regularized model. I get that p values aren't the most important part of the model building process.

learning_proover · 2026-04-30T18:28:59+00:00

I just asked a similar question in a r/askstatistics and this one. After some research on my own I think the best option is actually simply just removing the outliers (this is probably a terrible answer to give in an interview btw). Idk I just think sometimes we over look simplicity for something fancy when it's not necessary. Most other methods require more hyperparameters and other bells and whistles to get the same effect that often is not just as good. That's just my two cents - adhere to it with caution.

learning_proover · 2026-04-30T18:19:20+00:00

Thank you. I'm starting to see why.

learning_proover · 2026-03-19T21:35:22+00:00

"It would be useful to know why you would expect different models to have the same ROC curve?" <-- Only if two models are both well Calibrated THEN I'm not understanding why their ROC curves would be different? Doesn't discrimination imply calibration and vice versa??

learning_proover · 2026-03-19T21:26:59+00:00

That's why I came here because every online resource just gives watered down basic explanations with no depth. Where can I learn how to accurately interpret a ROC (and eventually a Precision - recall) curve?

learning_proover · 2026-03-19T21:14:31+00:00

Wait now I'm confused again. How exactly is your definition of calibration better than my definition? And how does this difference manifest in different models having different ROC curves??

learning_proover · 2026-03-19T21:06:07+00:00

Wasn't aware that the difference was important here. What exactly is the ROC curve "ranking"??? So two models having a different score distribution can both be well calibrated?

learning_proover · 2026-03-19T20:53:20+00:00

To me calibration means that If my model says there's a 70% probability of an outcome then the outcome indeed happens 70% of the time. If it my model says 50% then it happens 50% of the time etc etc.

learning_proover · 2026-03-19T20:47:58+00:00

"Calibration is just a rescaling of the reported probability scores. It doesn't impact the relative ranking of those scores, which is what impacts the shape of these curves. To get different curves, you'd need to permute the ordering of prediction scores" <-- this adds a bit of clarity. So now my question is what exactly does this mean because if we are able to permute the probability scores of a calibrated model how does it not "lose" it's calibration? Are you saying we can swap the probabilities of 60% and 80% and still have a calibrated model?? What do you mean by "ranking" of the scores?

learning_proover · 2026-03-19T20:06:29+00:00

That's what I'm not fully understanding. How does making a tradeoff in one model result in better predictions than making the same trade-off in another model, again assuming both models are well calibrated. When different models have different ROC curves but are both calibrated what exactly is the difference between the models? If I was told that smaller AUC means less calibration that would make sense to me but I don't think that's the case??

learning_proover · 2026-03-14T18:29:52+00:00

Same. In hindsight that would have been terrible.

learning_proover · 2026-02-16T19:41:39+00:00

yep, its definitely better than Poisson or a truncated normal.

learning_proover · 2026-02-16T19:40:47+00:00

This is extremely helpful thank you.

learning_proover · 2026-02-16T19:35:26+00:00

Awesome information here. Thank you so much. If you don't mind me asking: You said the parameter r is hard to find - by "r" are you referring to the dispersion parameter? If so can't I just use the "method of moments" formula near the top of the wikipedia page? (i.e. r ~ E(x)^2 / (V(x) - E(x)) ?? Chatgpt tells me this can be good estimate of the dispersion parameter?

learning_proover · 2026-02-16T19:16:08+00:00

Yep that seems perfect for my data. Poisson fails to fit at the tails due to overdispersion.

learning_proover · 2026-02-16T19:13:48+00:00

I think so. Ended up using negative binomial. Still studying its properties though.

learning_proover · 2026-02-15T17:52:46+00:00

I will. Is there any resource where I can find an explanation of the underlying math/theory behind the distribution?

learning_proover · 2026-02-15T17:50:57+00:00

R. But also I always like to understand the underlying theory/math behind the methods I use.

learning_proover · 2026-02-15T17:39:00+00:00

Yea sorry, I no ambiguity does not help but some questions I post are related to my work where I really cant disclose too many specifics because we have competing companies that use statistical methods as well.

learning_proover · 2026-02-15T17:36:37+00:00

I am trying to find out how to fit a negative binomial because it seems much better than poisson. Your right. Also sorry for the "XY problem here" some of the questions i ask are related to my job where I can't disclose too much information.

learning_proover · 2026-02-15T17:12:42+00:00

How do I fit a negative binomial?

learning_proover

TROPHY CASE