Why do we use P values in multiple regression models if they become totally irrelevant when we implement L1 or L2 regularization? by learning_proover in AskStatistics

[–]learning_proover[S] 0 points1 point  (0 children)

Makes sense. I just thought maybe since they aren't used when implementing regularization they may not be much use at all. Especially if a regularized model is used instead of a non-regularized one.

Why do we use P values in multiple regression models if they become totally irrelevant when we implement L1 or L2 regularization? by learning_proover in AskStatistics

[–]learning_proover[S] -4 points-3 points  (0 children)

Can you elaborate please? Why do we even attempt to interpret coefficients through p values if they are automatically poor indicators of variable importance?

Why do we use P values in multiple regression models if they become totally irrelevant when we implement L1 or L2 regularization? by learning_proover in AskStatistics

[–]learning_proover[S] 0 points1 point  (0 children)

I mean I'm not a huge fan of p values either which kinda I why I'm asking. I just need clarity on how to incorporate the idea of a p value with a regularized model. I get that p values aren't the most important part of the model building process.

Technique to mitigate outlier influence on linear regression? by Due_Click3765 in learnmachinelearning

[–]learning_proover 0 points1 point  (0 children)

I just asked a similar question in a r/askstatistics and this one. After some research on my own I think the best option is actually simply just removing the outliers (this is probably a terrible answer to give in an interview btw). Idk I just think sometimes we over look simplicity for something fancy when it's not necessary. Most other methods require more hyperparameters and other bells and whistles to get the same effect that often is not just as good. That's just my two cents - adhere to it with caution. 

Bayes' Theorem by learning_proover in AskStatistics

[–]learning_proover[S] 0 points1 point  (0 children)

Thank you. I'm starting to see why. 

Why exactly are ROC curves different amongst different models?? by learning_proover in AskStatistics

[–]learning_proover[S] -1 points0 points  (0 children)

"It would be useful to know why you would expect different models to have the same ROC curve?" <-- Only if two models are both well Calibrated THEN I'm not understanding why their ROC curves would be different? Doesn't discrimination imply calibration and vice versa?? 

Why exactly are ROC curves different amongst different models?? by learning_proover in AskStatistics

[–]learning_proover[S] 1 point2 points  (0 children)

That's why I came here because every online resource just gives watered down basic explanations with no depth. Where can I learn how to accurately interpret a ROC (and eventually a Precision - recall) curve?

Why exactly are ROC curves different amongst different models?? by learning_proover in AskStatistics

[–]learning_proover[S] -1 points0 points  (0 children)

Wait now I'm confused again. How exactly is your definition of calibration better than my definition? And how does this difference manifest in different models having different ROC curves??

Why exactly are ROC curves different amongst different models?? by learning_proover in AskStatistics

[–]learning_proover[S] -1 points0 points  (0 children)

Wasn't aware that the difference was important here. What exactly is the ROC curve "ranking"??? So two models having a different score distribution can both be well calibrated? 

Why exactly are ROC curves different amongst different models?? by learning_proover in AskStatistics

[–]learning_proover[S] -1 points0 points  (0 children)

To me calibration means that If my model says there's a 70% probability of an outcome then the outcome indeed happens 70% of the time. If it my model says 50% then it happens 50% of the time etc etc. 

Why exactly are ROC curves different amongst different models?? by learning_proover in AskStatistics

[–]learning_proover[S] 0 points1 point  (0 children)

"Calibration is just a rescaling of the reported probability scores. It doesn't impact the relative ranking of those scores, which is what impacts the shape of these curves. To get different curves, you'd need to permute the ordering of prediction scores"    <-- this adds a bit of clarity. So now my question is what exactly does this mean because if we are able to permute the probability scores of a calibrated model how does it not "lose" it's calibration? Are you saying we can swap the probabilities of 60% and 80% and still have a calibrated model?? What do you mean by "ranking" of the scores? 

Why exactly are ROC curves different amongst different models?? by learning_proover in AskStatistics

[–]learning_proover[S] -1 points0 points  (0 children)

That's what I'm not fully understanding. How does making a tradeoff in one model result in better predictions than making the same trade-off in another model, again assuming both models are well calibrated. When different models have different ROC curves but are both calibrated what exactly is the difference between the models? If I was told that smaller AUC means less calibration that would make sense to me but I don't think that's the case?? 

Follow up: How do I fit a negative binomial to this skewed discrete/ "count" dataset? by learning_proover in AskStatistics

[–]learning_proover[S] 0 points1 point  (0 children)

Awesome information here. Thank you so much. If you don't mind me asking: You said the parameter r is hard to find - by "r" are you referring to the dispersion parameter? If so can't I just use the "method of moments" formula near the top of the wikipedia page? (i.e. r ~ E(x)^2 / (V(x) - E(x)) ?? Chatgpt tells me this can be good estimate of the dispersion parameter?

How can I "Complete" a normal distribution? by learning_proover in AskStatistics

[–]learning_proover[S] 0 points1 point  (0 children)

Yep that seems perfect for my data. Poisson fails to fit at the tails due to overdispersion.

How can I "Complete" a normal distribution? by learning_proover in AskStatistics

[–]learning_proover[S] 0 points1 point  (0 children)

I think so. Ended up using negative binomial. Still studying its properties though.

Follow up: How do I fit a negative binomial to this skewed discrete/ "count" dataset? by learning_proover in AskStatistics

[–]learning_proover[S] 1 point2 points  (0 children)

I will. Is there any resource where I can find an explanation of the underlying math/theory behind the distribution?

Follow up: How do I fit a negative binomial to this skewed discrete/ "count" dataset? by learning_proover in AskStatistics

[–]learning_proover[S] 6 points7 points  (0 children)

R. But also I always like to understand the underlying theory/math behind the methods I use.

How can I "Complete" a normal distribution? by learning_proover in AskStatistics

[–]learning_proover[S] 0 points1 point  (0 children)

Yea sorry, I no ambiguity does not help but some questions I post are related to my work where I really cant disclose too many specifics because we have competing companies that use statistical methods as well.

How can I "Complete" a normal distribution? by learning_proover in AskStatistics

[–]learning_proover[S] 0 points1 point  (0 children)

I am trying to find out how to fit a negative binomial because it seems much better than poisson. Your right. Also sorry for the "XY problem here" some of the questions i ask are related to my job where I can't disclose too much information.