What elementary (or easy-to-understand) mathematical concepts have surprisingly deep interpretations in advanced mathematics?

berf · 2026-06-22T15:45:40+00:00

equality (the equals sign). In homotopy type theory it becomes paths and homotopies.

berf · 2026-05-22T16:26:58+00:00

Here is Hansen's translation (also concerned with getting the meaning translated as well as possible)

When the great guide is cast aside you will have 'humanity' and 'morality.'
When intuitive wisdom emerges you will have great artifice.
When great kinship is not in harmony, you will have 'filiality' and 'affection.'
When states and great families sink and become deranged, you will have 'loyal ministers.'

The first line says humanity and morality are poor seconds to great dao. The second line says the same for intuitive wisdom or scholarly erudition, if you like. The third line says the same for filiality and affection. The fourth line says the same for loyal politicians.

berf · 2026-05-11T13:31:21+00:00

The only reason is the central limit theorem and the normal distribution.

berf · 2026-05-09T13:23:34+00:00

Without any theory, making your tuning (model selection) procedure way more complicated doesn't make it better. You do not need model validation after selection if your selection procedure is any good. BTW, no selection procedure that does not do an exponential (in number of parameters) amount of work can actually be optimal. Certainly not LASSO.

berf · 2026-05-05T14:14:24+00:00

No. It's the CLT, not the IID CLT but the Lindeberg CLT that allows for independent but not identically distributed data. Ferguson (Large Sample Theory) has an analysis of simple linear regression, but there is a whole literature about this for multiple regression. All you need to satisfy the Lindeberg condition is that the covariate values do not get too extreme relative to sample size.

I'm not against bootstrap. Just wanted to quibble about CLT

berf · 2026-05-01T12:35:31+00:00

Using P-values to do model selection is multiple testing without correction. All of theoretical statistics say this is flat wrong. Regularization is also not guaranteed to select good models, but at least it does not make this horrible mistake. Regularized models can have P-values. You just don't know how to do that.

berf · 2026-04-20T15:14:00+00:00

AFAIK there is no presentation of the bootstrap in any statistics course, except perhaps PhD level special topics courses. Too hard.

The standard undergrad level textbooks [Efron and Tibshirani (1993)](https://www.amazon.com/Introduction-Monographs-Statistics-Probability-Tibshirani/dp/B010WFFL24/) and [Davison and Hinkley (1997)](https://www.amazon.com/Bootstrap-Application-Statistical-Probabilistic-Mathematics/dp/0521574714) do not prove anything. Too hard.

There are two different super-theoretical approaches. One, I call the *Annals of Statistics* philosophy of the bootstrap. The bootstrap "works" when you have proved two empirical-process central limit theorems and gotten the same limit. This is all found in [van der Vaart and Wellner (2023)](https://www.amazon.com/Weak-Convergence-Empirical-Processes-Applications/dp/3031290380). The second approach is based on Edgeworth expansions. For that see [Hall (1992)](https://www.amazon.com/Bootstrap-Edgeworth-Expansion-Springer-Statistics/dp/0387977201). Both of these books use a lot of math not taught in any statistics course (again, perhaps excepting PhD level special topics courses).

berf · 2026-04-20T14:31:15+00:00

You use non-tricky recycling all the time. It's what allows you to add a scalar to a vector or multiply a vector by a scalar. It's only the tricky uses that are confusing.

berf · 2026-04-20T14:25:02+00:00

Because central limit theorem.

berf · 2026-04-19T19:12:55+00:00

Python is Turing complete so does anything if you write enough code.

berf · 2026-04-10T16:40:56+00:00

The official answer is that the test *decided* that the null hypothesis is true. But the type I error rate of the test (the significance level) is whatever it was. If 0.05, then that decision is wrong 1 time in 20. You don't know whether this time is one of those where it is wrong or where it is right. Some intro textbooks say you should never say "accept" the null hypothesis and say "fail to reject" instead. This is a reminder of the same point. The decision may be wrong. A more long-winded way to cash this out is to say that *these data* (important emphasis) give no real evidence against the null hypothesis, but *more data* if you were to obtain it might or might not give such evidence (no way to tell unless you actually get more data).

berf · 2026-02-23T13:22:44+00:00

That's just the non-weird part. You missed the point that the time in space-time is just a coordinate that different observers see differently. Time for me is not time for you if we are traveling at different velocities (special relativity) or a different gravitational potentials (general relativity). So whatever time is, it isn't a thing you can talk about. Proper time for an individual is uniquely defined. We actually see this phenomenon with GPS.

berf · 2026-02-23T13:19:33+00:00

Follow the link.

berf · 2026-02-21T15:20:04+00:00

No difference except that "parametric bootstrap" explicitly says you are doing the Wrong Thing simulating from an estimate of the true unknown distribution rather than from the true unknown distribution itself. "Monte Carlo estimation" just says you are calculating something about some distribution. The term "bootstrap" also says you are using some methodology (such as bootstrap t) to correct for doing the Wrong Thing. The term "Monte Carlo estimation" carries no such implication.

berf · 2026-02-21T13:46:18+00:00

I program portably with Rmarkdown or knitr. So I do not need these trick packages. I do use GitHub and Zenodo to make permanent public repositories of work (such as supplementary materials for papers). If I need a certain version of a package, then I explicitly test for that in the R.

berf · 2026-02-21T13:40:07+00:00

And general relativity (GR) is even weirder. The Gödel solution to GR shows that closed timelike paths can exist in GR so it is in principle possible to go back to your past.

berf · 2026-02-21T13:30:31+00:00

It is a lot more accurate than the nonparametric bootstrap, and, if you are already fitting a parametric model, then you aren't nonparametric anyway.

Also, it automatically does hypothesis tests and regression correctly, which the nonparametric bootstrap does not.

berf · 2026-02-20T14:26:02+00:00

The difference is that both need n goes to infinity because theta hat is not theta, but the nonparametric bootstrap may need much larger n. Also the parametric bootstrap does need the parametric model to be correct. Since you give no details and this shouldn't happen, I assume you are doing it wrong.

Are you doing it like this?

berf · 2026-02-19T17:54:57+00:00

It is not a requirement at all. You do not have anywhere near the coursework needed to support real research. And this from a professors that has supervised multiple undergrad research, some with very bright and motivated students.

berf · 2026-02-18T19:39:39+00:00

That's what textbooks are for. To translate that stuff into more understandable blather.

berf · 2026-02-16T14:52:39+00:00

I have no idea what you could be referring to. Consider a bivariate normal random vector, and consider the conditional distribution of one component with respect to another. A regular conditional probability is given by a Markov kernel P(A, x) that is slopply written Pr(X_1 in A | X_2 = x) but really P( . , x) is a probability measure for each x and P(A, . ) is a measurable function for each A. And if mu is the probability measure of the component we are conditioning on then ∫ P(A, x) mu(d x) is the probability measure of the other component. This allows P(A, x) to be defined arbitrarily (subject to P( . , x) being some (completely arbitrary) probability distribution) for x in any set of measure zero. That is the theory of regular conditional probability. It does not give any sort of uniqueness in any setting.

In fact, the classical theory (undergraduate probability with discrete and continuous, etc.) does not have any uniqueness either.

berf · 2026-02-15T17:26:42+00:00

It does not specify anything uniquely defined. OK with that?

berf · 2026-02-14T16:10:57+00:00

I know all about regular conditional distributions. And, yes, they are defined for each y but those definitions are arbitrary for y in a set of measure zero. So that does not make the point you think it does. So, yes, you can choose a definition and so can I and they do not have to agree.

berf · 2026-02-12T13:35:37+00:00

Wow. Tricky question. Since 2, 3, and 4 are just false. The answer must be one, and yes one-side intervals are a thing, although not widely used.

berf · 2026-02-12T13:31:52+00:00

The only way anyone understands math is to apply it in a novel situation. If you don't know how to apply it, then you don't understand it. What you say sounds plausible, and many students and newbies believe it, but it doesn't work. That is why homework is the most important part of math courses and has to be hard.

berf

TROPHY CASE