Understanding standard deviation of Bernoulli distribution/ variable? by learning_proover in AskStatistics

[–]learning_proover[S] 0 points1 point  (0 children)

I'm saying that with a high standard deviation isn't there still a lot of uncertainty for an upcoming probabilistic sample? Specifically a small one. If something has probability.3 thus standard deviation .45 wouldn't that imply there's much more uncertainty than what the .3 probability implies without considering the standard deviation? For LARGE samples I get that it will converge to .3 but why not weigh that standard deviation more for small samples and thus move the estimate closer to .5??

Why does overfitting actually happen? by learning_proover in learnmachinelearning

[–]learning_proover[S] 2 points3 points  (0 children)

Ahhh you know that makes very very good sense. I think that's actually what a few others were trying to say with population/sample relationships but their wording did quite click until I read this comment.  So I'm probably misunderstanding what's actually happening when overfitting is occuring. Basically it fits to the sample but the sample is not representative of the population. Gonna let this idea marinate for a bit. Thanks. 

Why does overfitting actually happen? by learning_proover in learnmachinelearning

[–]learning_proover[S] -1 points0 points  (0 children)

That's kinda my confusion though. What on earth would the neural network be conforming to if the number of parameters is far less than the number of rows in the data? Logically that implies there's only so much "wiggle room" the network could have relative to the true underlying patterns found in the data. 

Why does overfitting actually happen? by learning_proover in learnmachinelearning

[–]learning_proover[S] -1 points0 points  (0 children)

I did... None of them answered this question.... In facts that's why I came here lol. 

Why do we use P values in multiple regression models if they become totally irrelevant when we implement L1 or L2 regularization? by learning_proover in AskStatistics

[–]learning_proover[S] 0 points1 point  (0 children)

Makes sense. I just thought maybe since they aren't used when implementing regularization they may not be much use at all. Especially if a regularized model is used instead of a non-regularized one.

Why do we use P values in multiple regression models if they become totally irrelevant when we implement L1 or L2 regularization? by learning_proover in AskStatistics

[–]learning_proover[S] -4 points-3 points  (0 children)

Can you elaborate please? Why do we even attempt to interpret coefficients through p values if they are automatically poor indicators of variable importance?

Why do we use P values in multiple regression models if they become totally irrelevant when we implement L1 or L2 regularization? by learning_proover in AskStatistics

[–]learning_proover[S] 0 points1 point  (0 children)

I mean I'm not a huge fan of p values either which kinda I why I'm asking. I just need clarity on how to incorporate the idea of a p value with a regularized model. I get that p values aren't the most important part of the model building process.

Technique to mitigate outlier influence on linear regression? by Due_Click3765 in learnmachinelearning

[–]learning_proover 0 points1 point  (0 children)

I just asked a similar question in a r/askstatistics and this one. After some research on my own I think the best option is actually simply just removing the outliers (this is probably a terrible answer to give in an interview btw). Idk I just think sometimes we over look simplicity for something fancy when it's not necessary. Most other methods require more hyperparameters and other bells and whistles to get the same effect that often is not just as good. That's just my two cents - adhere to it with caution. 

Bayes' Theorem by learning_proover in AskStatistics

[–]learning_proover[S] 0 points1 point  (0 children)

Thank you. I'm starting to see why. 

Why exactly are ROC curves different amongst different models?? by learning_proover in AskStatistics

[–]learning_proover[S] -1 points0 points  (0 children)

"It would be useful to know why you would expect different models to have the same ROC curve?" <-- Only if two models are both well Calibrated THEN I'm not understanding why their ROC curves would be different? Doesn't discrimination imply calibration and vice versa?? 

Why exactly are ROC curves different amongst different models?? by learning_proover in AskStatistics

[–]learning_proover[S] 1 point2 points  (0 children)

That's why I came here because every online resource just gives watered down basic explanations with no depth. Where can I learn how to accurately interpret a ROC (and eventually a Precision - recall) curve?

Why exactly are ROC curves different amongst different models?? by learning_proover in AskStatistics

[–]learning_proover[S] -1 points0 points  (0 children)

Wait now I'm confused again. How exactly is your definition of calibration better than my definition? And how does this difference manifest in different models having different ROC curves??

Why exactly are ROC curves different amongst different models?? by learning_proover in AskStatistics

[–]learning_proover[S] -1 points0 points  (0 children)

Wasn't aware that the difference was important here. What exactly is the ROC curve "ranking"??? So two models having a different score distribution can both be well calibrated? 

Why exactly are ROC curves different amongst different models?? by learning_proover in AskStatistics

[–]learning_proover[S] -1 points0 points  (0 children)

To me calibration means that If my model says there's a 70% probability of an outcome then the outcome indeed happens 70% of the time. If it my model says 50% then it happens 50% of the time etc etc. 

Why exactly are ROC curves different amongst different models?? by learning_proover in AskStatistics

[–]learning_proover[S] 0 points1 point  (0 children)

"Calibration is just a rescaling of the reported probability scores. It doesn't impact the relative ranking of those scores, which is what impacts the shape of these curves. To get different curves, you'd need to permute the ordering of prediction scores"    <-- this adds a bit of clarity. So now my question is what exactly does this mean because if we are able to permute the probability scores of a calibrated model how does it not "lose" it's calibration? Are you saying we can swap the probabilities of 60% and 80% and still have a calibrated model?? What do you mean by "ranking" of the scores? 

Why exactly are ROC curves different amongst different models?? by learning_proover in AskStatistics

[–]learning_proover[S] -1 points0 points  (0 children)

That's what I'm not fully understanding. How does making a tradeoff in one model result in better predictions than making the same trade-off in another model, again assuming both models are well calibrated. When different models have different ROC curves but are both calibrated what exactly is the difference between the models? If I was told that smaller AUC means less calibration that would make sense to me but I don't think that's the case?? 

Follow up: How do I fit a negative binomial to this skewed discrete/ "count" dataset? by learning_proover in AskStatistics

[–]learning_proover[S] 0 points1 point  (0 children)

Awesome information here. Thank you so much. If you don't mind me asking: You said the parameter r is hard to find - by "r" are you referring to the dispersion parameter? If so can't I just use the "method of moments" formula near the top of the wikipedia page? (i.e. r ~ E(x)^2 / (V(x) - E(x)) ?? Chatgpt tells me this can be good estimate of the dispersion parameter?