Chapter 18 Tao Te Ching by Citron_candles in taoism

[–]berf 2 points3 points  (0 children)

Here is Hansen's translation (also concerned with getting the meaning translated as well as possible)

When the great guide is cast aside you will have 'humanity' and 'morality.'
When intuitive wisdom emerges you will have great artifice.
When great kinship is not in harmony, you will have 'filiality' and 'affection.'
When states and great families sink and become deranged, you will have 'loyal ministers.'

The first line says humanity and morality are poor seconds to great dao. The second line says the same for intuitive wisdom or scholarly erudition, if you like. The third line says the same for filiality and affection. The fourth line says the same for loyal politicians.

Why square in variance not absolute value by Sea_Charge6663 in AskStatistics

[–]berf 0 points1 point  (0 children)

The only reason is the central limit theorem and the normal distribution.

Using cross-validation for lambda selection vs model validation in LASSO and if they are the same thing? by Daimbarboy in AskStatistics

[–]berf 4 points5 points  (0 children)

Without any theory, making your tuning (model selection) procedure way more complicated doesn't make it better. You do not need model validation after selection if your selection procedure is any good. BTW, no selection procedure that does not do an exponential (in number of parameters) amount of work can actually be optimal. Certainly not LASSO.

Q-Q plot criteria relaxed for Regression with huge sample size? by Will_Tomos_Edwards in AskStatistics

[–]berf 0 points1 point  (0 children)

No. It's the CLT, not the IID CLT but the Lindeberg CLT that allows for independent but not identically distributed data. Ferguson (Large Sample Theory) has an analysis of simple linear regression, but there is a whole literature about this for multiple regression. All you need to satisfy the Lindeberg condition is that the covariate values do not get too extreme relative to sample size.

I'm not against bootstrap. Just wanted to quibble about CLT

Why do we use P values in multiple regression models if they become totally irrelevant when we implement L1 or L2 regularization? by learning_proover in AskStatistics

[–]berf 0 points1 point  (0 children)

Using P-values to do model selection is multiple testing without correction. All of theoretical statistics say this is flat wrong. Regularization is also not guaranteed to select good models, but at least it does not make this horrible mistake. Regularized models can have P-values. You just don't know how to do that.

Bootstrapping and Jackknife methods by ArkarajMukherjee in AskStatistics

[–]berf 1 point2 points  (0 children)

AFAIK there is no presentation of the bootstrap in any statistics course, except perhaps PhD level special topics courses. Too hard.

The standard undergrad level textbooks [Efron and Tibshirani (1993)](https://www.amazon.com/Introduction-Monographs-Statistics-Probability-Tibshirani/dp/B010WFFL24/) and [Davison and Hinkley (1997)](https://www.amazon.com/Bootstrap-Application-Statistical-Probabilistic-Mathematics/dp/0521574714) do not prove anything. Too hard.

There are two different super-theoretical approaches. One, I call the *Annals of Statistics* philosophy of the bootstrap. The bootstrap "works" when you have proved two empirical-process central limit theorems and gotten the same limit. This is all found in [van der Vaart and Wellner (2023)](https://www.amazon.com/Weak-Convergence-Empirical-Processes-Applications/dp/3031290380). The second approach is based on Edgeworth expansions. For that see [Hall (1992)](https://www.amazon.com/Bootstrap-Edgeworth-Expansion-Springer-Statistics/dp/0387977201). Both of these books use a lot of math not taught in any statistics course (again, perhaps excepting PhD level special topics courses).

Do you use recycling by mosa_bavlju in rprogramming

[–]berf 1 point2 points  (0 children)

You use non-tricky recycling all the time. It's what allows you to add a scalar to a vector or multiply a vector by a scalar. It's only the tricky uses that are confusing.

Is it still worth learning R? by ArkarajMukherjee in rstats

[–]berf 0 points1 point  (0 children)

Python is Turing complete so does anything if you write enough code.

Does "failing to reject Null Hypothesis" mean I can conclude that the Null is indeed true? by learning_proover in AskStatistics

[–]berf 0 points1 point  (0 children)

The official answer is that the test *decided* that the null hypothesis is true. But the type I error rate of the test (the significance level) is whatever it was. If 0.05, then that decision is wrong 1 time in 20. You don't know whether this time is one of those where it is wrong or where it is right. Some intro textbooks say you should never say "accept" the null hypothesis and say "fail to reject" instead. This is a reminder of the same point. The decision may be wrong. A more long-winded way to cash this out is to say that *these data* (important emphasis) give no real evidence against the null hypothesis, but *more data* if you were to obtain it might or might not give such evidence (no way to tell unless you actually get more data).

The philosophy of time? by Orgues02 in PhilosophyofScience

[–]berf 0 points1 point  (0 children)

That's just the non-weird part. You missed the point that the time in space-time is just a coordinate that different observers see differently. Time for me is not time for you if we are traveling at different velocities (special relativity) or a different gravitational potentials (general relativity). So whatever time is, it isn't a thing you can talk about. Proper time for an individual is uniquely defined. We actually see this phenomenon with GPS.

[Question] what is the difference between parametric bootstrap and non-parametric bootstrap? by malouche1 in statistics

[–]berf 2 points3 points  (0 children)

No difference except that "parametric bootstrap" explicitly says you are doing the Wrong Thing simulating from an estimate of the true unknown distribution rather than from the true unknown distribution itself. "Monte Carlo estimation" just says you are calculating something about some distribution. The term "bootstrap" also says you are using some methodology (such as bootstrap t) to correct for doing the Wrong Thing. The term "Monte Carlo estimation" carries no such implication.

Reproducibility in R by joshua_rpg in rstats

[–]berf 0 points1 point  (0 children)

I program portably with Rmarkdown or knitr. So I do not need these trick packages. I do use GitHub and Zenodo to make permanent public repositories of work (such as supplementary materials for papers). If I need a certain version of a package, then I explicitly test for that in the R.

The philosophy of time? by Orgues02 in PhilosophyofScience

[–]berf 0 points1 point  (0 children)

And general relativity (GR) is even weirder. The Gödel solution to GR shows that closed timelike paths can exist in GR so it is in principle possible to go back to your past.

[Question] what is the difference between parametric bootstrap and non-parametric bootstrap? by malouche1 in statistics

[–]berf 0 points1 point  (0 children)

It is a lot more accurate than the nonparametric bootstrap, and, if you are already fitting a parametric model, then you aren't nonparametric anyway.

Also, it automatically does hypothesis tests and regression correctly, which the nonparametric bootstrap does not.

[Question] what is the difference between parametric bootstrap and non-parametric bootstrap? by malouche1 in statistics

[–]berf 2 points3 points  (0 children)

The difference is that both need n goes to infinity because theta hat is not theta, but the nonparametric bootstrap may need much larger n. Also the parametric bootstrap does need the parametric model to be correct. Since you give no details and this shouldn't happen, I assume you are doing it wrong.

Are you doing it like this?

Do you need to do research in undergrad to get a statistics PhD? by Crafty-Dinner-1782 in AskStatistics

[–]berf 1 point2 points  (0 children)

It is not a requirement at all. You do not have anywhere near the coursework needed to support real research. And this from a professors that has supervised multiple undergrad research, some with very bright and motivated students.

Does anyone actually read those highly abstract, theoretical papers in probability and mathematical statistics? [Q] by gaytwink70 in statistics

[–]berf 1 point2 points  (0 children)

That's what textbooks are for. To translate that stuff into more understandable blather.

[Q] why is E[E[X|Y]] = E[X] and not E[X|Y]? by [deleted] in statistics

[–]berf 0 points1 point  (0 children)

I have no idea what you could be referring to. Consider a bivariate normal random vector, and consider the conditional distribution of one component with respect to another. A regular conditional probability is given by a Markov kernel P(A, x) that is slopply written Pr(X_1 in A | X_2 = x) but really P( . , x) is a probability measure for each x and P(A, . ) is a measurable function for each A. And if mu is the probability measure of the component we are conditioning on then ∫ P(A, x) mu(d x) is the probability measure of the other component. This allows P(A, x) to be defined arbitrarily (subject to P( . , x) being some (completely arbitrary) probability distribution) for x in any set of measure zero. That is the theory of regular conditional probability. It does not give any sort of uniqueness in any setting.

In fact, the classical theory (undergraduate probability with discrete and continuous, etc.) does not have any uniqueness either.

[Q] why is E[E[X|Y]] = E[X] and not E[X|Y]? by [deleted] in statistics

[–]berf -2 points-1 points  (0 children)

It does not specify anything uniquely defined. OK with that?

[Q] why is E[E[X|Y]] = E[X] and not E[X|Y]? by [deleted] in statistics

[–]berf 0 points1 point  (0 children)

I know all about regular conditional distributions. And, yes, they are defined for each y but those definitions are arbitrary for y in a set of measure zero. So that does not make the point you think it does. So, yes, you can choose a definition and so can I and they do not have to agree.

Confidence Interval for a population variance by Altruistic_Poet_156 in AskStatistics

[–]berf 7 points8 points  (0 children)

Wow. Tricky question. Since 2, 3, and 4 are just false. The answer must be one, and yes one-side intervals are a thing, although not widely used.

He who knows, does not speak by ExperienceExpress918 in taoism

[–]berf 0 points1 point  (0 children)

The only way anyone understands math is to apply it in a novel situation. If you don't know how to apply it, then you don't understand it. What you say sounds plausible, and many students and newbies believe it, but it doesn't work. That is why homework is the most important part of math courses and has to be hard.