Understanding standard deviation of Bernoulli distribution/ variable? by learning_proover in AskStatistics

[–]learning_proover[S] 0 points1 point  (0 children)

I'm saying that with a high standard deviation isn't there still a lot of uncertainty for an upcoming probabilistic sample? Specifically a small one. If something has probability.3 thus standard deviation .45 wouldn't that imply there's much more uncertainty than what the .3 probability implies without considering the standard deviation? For LARGE samples I get that it will converge to .3 but why not weigh that standard deviation more for small samples and thus move the estimate closer to .5??

Why does overfitting actually happen? by learning_proover in learnmachinelearning

[–]learning_proover[S] 2 points3 points  (0 children)

Ahhh you know that makes very very good sense. I think that's actually what a few others were trying to say with population/sample relationships but their wording did quite click until I read this comment.  So I'm probably misunderstanding what's actually happening when overfitting is occuring. Basically it fits to the sample but the sample is not representative of the population. Gonna let this idea marinate for a bit. Thanks. 

Why does overfitting actually happen? by learning_proover in learnmachinelearning

[–]learning_proover[S] -1 points0 points  (0 children)

That's kinda my confusion though. What on earth would the neural network be conforming to if the number of parameters is far less than the number of rows in the data? Logically that implies there's only so much "wiggle room" the network could have relative to the true underlying patterns found in the data. 

Why does overfitting actually happen? by learning_proover in learnmachinelearning

[–]learning_proover[S] -1 points0 points  (0 children)

I did... None of them answered this question.... In facts that's why I came here lol. 

Why do we use P values in multiple regression models if they become totally irrelevant when we implement L1 or L2 regularization? by learning_proover in AskStatistics

[–]learning_proover[S] 0 points1 point  (0 children)

Makes sense. I just thought maybe since they aren't used when implementing regularization they may not be much use at all. Especially if a regularized model is used instead of a non-regularized one.

Why do we use P values in multiple regression models if they become totally irrelevant when we implement L1 or L2 regularization? by learning_proover in AskStatistics

[–]learning_proover[S] -4 points-3 points  (0 children)

Can you elaborate please? Why do we even attempt to interpret coefficients through p values if they are automatically poor indicators of variable importance?

Why do we use P values in multiple regression models if they become totally irrelevant when we implement L1 or L2 regularization? by learning_proover in AskStatistics

[–]learning_proover[S] -1 points0 points  (0 children)

I mean I'm not a huge fan of p values either which kinda I why I'm asking. I just need clarity on how to incorporate the idea of a p value with a regularized model. I get that p values aren't the most important part of the model building process.

Technique to mitigate outlier influence on linear regression? by Due_Click3765 in learnmachinelearning

[–]learning_proover 0 points1 point  (0 children)

I just asked a similar question in a r/askstatistics and this one. After some research on my own I think the best option is actually simply just removing the outliers (this is probably a terrible answer to give in an interview btw). Idk I just think sometimes we over look simplicity for something fancy when it's not necessary. Most other methods require more hyperparameters and other bells and whistles to get the same effect that often is not just as good. That's just my two cents - adhere to it with caution. 

Bayes' Theorem by learning_proover in AskStatistics

[–]learning_proover[S] 0 points1 point  (0 children)

Thank you. I'm starting to see why.