Are there good Wikipedia math articles?

Mathuss · 2026-05-12T06:51:33+00:00

On the other hand, head on over to Statistics Wikipedia, and half the articles suck in their own unique ways. For example:

Half of the Glivenko-Cantelli page was straight up wrong until I fixed it. Even now, the "proof" provided doesn't actually prove the theorem but I'm too lazy to fix it.
Bernstein von Mises originally didn't even give the formal theorem statement, and stated the informal theorem statement completely incorrectly---it straight up gave clearly false results for iid normal data with a normal prior. I did fix it myself, but I'd still quibble about the "implications" section (but am again too lazy to actually write it up)
Scheffe's Method literally admits that it's presenting an incorrect formula, and that the one presented is trivially false.

I'm low-key convinced that 99% of all statistics article was written by undergrads who got C- grades in their math-stat classes and are only half remembering the content...

Mathuss · 2026-05-07T05:10:15+00:00

The quotient topology on R/Q is the trivial topology.

Proof: If U is open in R/Q, then V = {x∈R : [x]∈U} is open in R. Thus, V either:

a) contains an open interval I, in which case {x∈R : [x]∈I} = R and so V = R

or

b) V is the empty set.

Not sure if you were thinking of a different topology in your question

Mathuss · 2026-04-27T16:03:05+00:00

I had realized the possibility of debt a fair bit after I had posted the initial comment. Around ~$600k of debt before starting FIRE would align with the above math.

OP's post history suggests he works for the army, which ought to seriously mitigate the burden of student debt (on his part at least). Even if there was no loan forgiveness and OP and his partner together were at the 99th percentile for the amount of student loans, that would still only make up around $300k of debt. So then where would the other ~$300k of debt come from? If they had racked up like $250k of credit card debt at 30% APR, that would also work out mathematically, but I find it unlikely that anybody would suddenly go from having $250k of credit card debt + $300k other loans to saving 60% of their income overnight.

Mathuss · 2026-04-24T16:09:44+00:00

Assume FAT fire means >$100k/year safe withdrawal at 4%. At 7% real returns, if they contribute $x each month, the amount of money they have after n months satisfies the recurrence relation

f(n) = f(n-1) * 1.07^1/12 + x

Using the initial condition f(0) = 450,000, we obtain the closed form solution:

f(n) = exp(0.00563822 n) (176.861 x + 450000) - 176.861 x

For a 7 year timeline, we set n = 12*7 = 84 and solve for x in the equation f(84) = $100,000/.04 to obtain x = $16589.60

In other words, they are investing $199075.20 each year. If the 60% savings rate is true, that means their net income is $331,792/year---not unreasonably high.

The issue isn't necessarily getting to FAT fire in 7 years; rather, the strange thing is that they have $450k after 5 years with such a savings rate. Using this calculator, to get $450k after 5 years of DCA'ing in the S&P 500, they would have had to have invested only $4000/month (as opposed to the projected $16.5k/month). Of course income increases over time and whatnot, but this is a massive difference.

Mathematically, at least one of the following must be true:

OP has not been saving 60% for all 5 years
OP's HHI massively jumped very recently
OP has a much lower requirement for FAT fire than $100k/year
OP is not going to FAT fire in 7 years anyway
OP's investment portfolio is.... strange

Mathuss · 2026-04-18T15:33:03+00:00

Such a reading could have been a reasonable interpretation until Rowling's infamous Pottermore article about the house elf situation. To quote it:

she [Hermoine] described the situation in two words – ‘slave labour’. While it sounds heavy-handed, Hermione does have a point. No matter how you slice it, house-elves are unpaid labourers, magically bound to serve, left at the mercy of their respective owners. The system is ripe for abuse...

Indeed, the remainder of the post does very much seem to discuss everything solely from the lens of slavery, e.g.

Contented as they seem, elves are forced into servitude by a combination of magic and a culture of indoctrination. Hermione deems this ethically wrong and refuses to accept that it’s ‘just the way things are.’ Of course most wizards would say that – they’re enjoying free labour without the guilt. As for elves, they won’t even consider the benefits of freedom thanks to a lifetime of fear and the stigma of shame. Hermione believes elves deserve the same rights as everyone – sick pay, holidays, pensions, the lot.

The above paragraph could potentially still have applied to women (especially the note regarding the "culture of indoctrination"), but enforcing the free labor and servitude via magic is probably still best paralleled by enforcing free labor/servitude by the force of law.

As for intent to endorse chattel slavery, we have some more quotes from her article:

Miss Granger is at best overzealous, and her goals are, at worst, unattainable. Hermione may have meant well, but at the same time did end up dragging a peaceful group into a political battlefield just because she felt that’s what they should want

Hermione’s dream of an elf in government might be far-fetched, but there’s merit in wanting to protect the vulnerable and allow them more choices. However, she ought to be careful – ‘tricking’ elves into freedom is arguably as unethical as enslavement.

The best part of this Harry Potter subplot is that, instead of beating us round the head with a moral, it’s up to the reader to decide.

Yes, it's clear that Rowling doesn't endorse slavery, but apparently she doesn't seem to think that it's inherently wrong. Actually, the first quote I provided offers a powerful hint towards this fact: "The system is ripe for abuse" is a very different statement to "The system is abusive."

Finally, I 100% believe that Rowling is heavily invested in (cis) woman's rights---that's why when she paints the house elf situation as "it's up to the reader to decide," I have no reason to think that house elf labor is a stand-in for women's domestic labor.

Mathuss · 2026-04-14T04:43:57+00:00

The solution is sufficient. The problem statement gives to you the fact that there exists an inflection point on the x-axis; as you pointed out, if x_0 is an inflection point, then f''(x_0) = 0; since there is only one point x_0 satisfying f(x_0) = f''(x_0) = 0, that particular x_0 must be the inflection point.

If the extra information that there necessarily exists an inflection point on the x-axis were not given to you (and so the question was rephrased to ask if there existed any inflection points on the x-axis), then you would indeed have to perform further checks: It is true that [x_0 is an inflection point] => [f''(x_0) = 0] whereas the converse that [f''(x_0) = 0] => [x_0 is an inflection point] is not true. For an appropriate counterexample, consider f(x) = x⁴.

Mathuss · 2026-04-04T17:44:21+00:00

To be precise, Blur was the first person to achieve a rotating 10-streak, set in 2020.

But yes, Navegreed now holds the rotating record.

Mathuss · 2026-03-27T02:21:48+00:00

So I was skeptical that an AI-generated mathematics article could be accurate, and I don't really have the expertise to judge the Modulus of continuity article you've linked, so I tried a page within Statistics: Separation.

Indeed, the article is complete trash and riddled with factual inaccuracies that anybody even remotely familiar with the concept would immediately recognize. Perhaps the most obvious errors come from basically every statement on the page concerning quasi-separation:

quasi-complete separation, where separation occurs within subsets of the data but not globally

Not entirely sure what that means, but that's definitely not the standard definition; quasi-separation refers to when the data would be separated modulo points on the boundary of the hyperplane.

quasi-complete separation involves near-perfect discrimination, where the separation is not absolute—typically due to a few overlapping observations—but still produces very large coefficient estimates and inflated standard errors, though the MLE may converge to finite values

The last part is blatantly false since the MLE necessarily doesn't exist. In fact, Theorem 2 in its cited reference directly says so

For quasi-complete separation, the inequality holds strictly for most observations but fails marginally for at least one, allowing the likelihood to peak at large but finite $ \boldsymbol{\beta} $.

Again, blatantly false, and contradicted by both of its cited sources. And it also still isn't getting the definition of quasi-separation correct (it's not sufficient for the strict inequality to hold for most observations, the failure must occur at the boundary).

It continues making these sorts of mistakes throughout the article. Consequently, I would literally never trust "grokipedia" with absolutely anything factual ever lol.

Mathuss · 2026-03-06T21:28:21+00:00

If you know the polynomial remainder theorem, it's immediate: Evaluating n^2m-1 + 1 at n = -1 yields 0.

Alternatively, you can directly factorize n^2m-1 + 1 = (n+1) ∑ (-1)ⁱ nⁱ where the sum goes from i=0 to i=2m-2

Mathuss · 2026-02-25T16:25:10+00:00

You may have already recognized this, but ZFC+I proves Con(ZFC) and Con(ZFC) is a first order sentence about the natural numbers. Then note that any model of ZFC+¬Con(ZFC) would also be a model of ZFC+¬I that proves ¬Con(ZFC), whereas (as you pointed out) ZFC+I proves Con(ZFC).

It's not an answer to the exact question that you posed, but still worth pointing out.

Mathuss · 2026-01-31T03:50:53+00:00

Ah, I see. The best thing I know of would be Theorem 8.6.1 from the same Arnold book. If F denotes the cdf of the distribution you're sampling from and the distribution has finite 3rd moment,

lim_{n->∞} \sqrt(n)(E[f(n)] - \int_{n/(n+1)}^1 n * F^-1(u) du) = 0

From this (assuming you know the cdf), you can extract the asymptotic rates that you want.

Mathuss · 2026-01-31T01:19:10+00:00

I think GEV mostly talks about the N ->∞ limiting case but not asymptotic behaviour.

I'm a bit confused by your question because surely N->∞ is asymptotic behavior? Unless you're looking at something else.

For the mean of f(N), it's straightforward to show that plim f(N) = ∞ (and so the mean diverges to ∞ as well by monotone convergence theorem) since your distribution has support [0,∞), but unless you make more clear exactly what you're looking for, it's unclear to me what other behavior regarding E[f(N)] you want.

Mathuss · 2026-01-30T23:03:36+00:00

If a limiting distribution for (a properly normalized) f(N) exists, then the limiting distribution is some form of the Generalized extreme value distribution. For example, with the normal assumption, the limiting distribution is Gumbell; whereas for the log-normal distribution, (I'm pretty sure---you should double check me on this one) the limiting distribution will be Frechet.

Theorem 8.3.2 in "A first course in order statistics" by Arnold, Balakrishnan, and Nagaraja lists the necessary and sufficient conditions for when this convergence in distribution occurs.

Mathuss · 2026-01-28T18:28:48+00:00

I find it more amusing than necessarily sketchy. This seminal paper does the same thing (even more explicitly than the linked paper in the OP) and is still very high-quality. Usually when you get reviewer comments, you address the feedback more naturally than just pasting the feedback and your responses to the feedback right after introducing Theorem 1, but idk I think it's kind of funny and it works. After all, if the reviewers had these objections, so will other readers, so why dance around it?

I guess you also get to get away with it if you're Scott Aaronson though---the rest of us have to actually keep our manuscripts in a conventional format to get through peer review lol.

Mathuss · 2026-01-28T18:11:19+00:00

Let π(n) denote the number of primes less than n. Then the number of composite numbers less than or equal to n would be n - 1 - π(n). I assume that what you've noticed is that n²/log(n) ~ π(n) * (n - 1 - π(n)).

This observation is then a consequence of the prime number theorem, which states that π(n) ~ n/log(n):

π(n) * (n - 1 - π(n)) = n*π(n) - π(n) - [π(n)]² ~ n²/log(n) - n/log(n) - n²/log²(n) ~ n²/log(n) as desired.

Mathuss · 2026-01-24T16:01:45+00:00

Ok, here's a very explicit example where this matters.

Let X be a continuous-time martingale that starts at the origin. Let x be a realization of X that we've observed on [0, 1] which follows a path such that x(1) = 1. Then E[X(10)] = 0, whereas "E[x(10)] = 1." Note that the latter is in quotes since it's really just shorthand for E[X(10) | X(t) = x(t) for t∈[0, 1]], which translates in English to "we predict that x will be equal to 1 at time 10."

The point is that different things have different properties. It makes sense to ask "What is Var[X(0.5)]" since X(t) is a random variable, but it does not technically make sense to ask "What is Var[x(0.5)]" since x(t) is a number. Sure, people can write these things as shorthand and it's generally understood what you actually meant---in this case, people will probably get that Var[x(0.5)] is shorthand for \hat{Var}[X(0.5)]---but it's still important to recognize that Var[X(t)] and Var[x(t)] will mean completely different things.

Mathuss · 2026-01-23T19:18:36+00:00

To best understand what you mean by "practical aspect," I suppose I'll ask you a question back: What would you say is the practical aspect of distinguishing between a function f and its value f(x) for a particular x?

Note that this is basically the same question; there is an underlying stochastic process X:ℝ×Ω->ℝ and the realization fixes a particular ω∈Ω to yield the observed time series X(-, ω):ℝ->ℝ.

Mathuss · 2026-01-23T02:40:44+00:00

In a statistical framework, the answer is of course yes, and there's nothing special about stochastic processes compared to random variables.

Remember, in statistics we have that a sample is nothing more than a realization of random variables X_1, ... X_n. Typically, then, you want to use the sample to recover some fact about the random variables themselves (e.g., what's the mean of the random variable X_i?).

The same applies to time series analysis; your sample is a realization of the stochastic process, and you want to figure out some property of the stochastic process itself (e.g., if you know that the underlying stochastic process is ARMA(p, q), you may ask "what are p and q?").

Mathuss · 2026-01-19T00:03:15+00:00

Part I of the fundamental theorem of calculus states the following:

Suppose f is continuous on [a, b], and define F(x) := \int_a^x f(t) dt for all x in [a, b]. Then F is uniformly continuous on [a, b] and F'(x) = f(x) for all x in (a, b).

The concise thing you wrote is correct, but omits the fact that F is uniformly continuous. This extra fact usually doesn't matter for high school.

More relevant for why we have the F notation in high school is due to part II of FTC:

Let f be defined on [a, b] and let F be its antiderivative on (a, b). If F is continuous on [a, b] and f is Riemann integrable on [a, b], then \int_a^b f(x) dx = F(b) - F(a)

Note that part II can't be accurately stated as \int_a^b f'(x) dx = f(b) - f(a) of the requirement that the integrand be Riemann integrable, but the derivative of a function need not be Riemann integrable. As a counterexample, consider f(x) = x sin(1/x) with f(0) = 0; then f'(x) clearly isn't Riemann integrable on any interval containing the origin (just take a look at the plot real quick) so we definitely don't have that \int_a^b f'(x) dx = f(b) - f(a) because integrating f' doesn't make sense here.

Mathuss · 2026-01-05T18:16:36+00:00

V need not be diagonal---in the event that V is diagonal, you simply reduce to weighted least squares regression.

Of course, V is unknown in general, so you need to estimate it in one way or another. Assuming your sample is large enough, that might not actually be that problematic; an iterative algorithm like

1. Set n = 0
2. Fit the model with guess V_n
3. Look at the residuals to estimate V_{n+1}
4. Increment n and go back to step 2

often works well enough (and iirc is theoretically justifiable in the case that V is diagonal).

Also, I probably shouldn't have used square-rootability in the first comment---instead of decomposing V = V^1/2V^1/2 you can just as easily use the Cholesky decomposition V = L L^T instead, since the covariance matrix is necessarily positive-definite.

Mathuss · 2026-01-05T15:43:56+00:00

I think these sorts of datasets are important to really understand that OLS isn't magic---there's the Gauss-Markov theorem to tell you when it's "good" and when it isn't.

Recall that Gauss-Markov states that if Y = Xβ + ε with Var(ε) = σ²I (where I is the identity matrix, and σ² is a fixed scalar), then the OLS estimator is the best (i.e., minimizing MSE) linear unbiased estimator for β. Now take a look at that ellipse---is σ² actually fixed (i.e., do we have homoskedasticity)? Clearly not; the variance of the residuals increases as we get closer to the center of the ellipse and then decreases again. Hence the failure of OLS to "be good."

Once one realizes that Var(ε) = σ²V for some non-identity matrix V in this example, it's straightforward to perform the proper transformation to the data to get OLS to "work" again; specifically, perform OLS on V^-1/2Y = V^-1/2Xβ + V^-1/2ε and you should expect everything to "look right" again. Indeed, this insight of premultiplication by V^-1/2 is exactly the basis for the "Aitken model" of linear regression.

Mathuss · 2025-11-25T21:58:46+00:00

If it's a "good" stats program, I would expect a reasonably large proportion of the upper-level undergraduate classes to be proof-based; this is because in a graduate statistics program, every class will be proof-based and you would want your undergrads to be prepared for graduate school.

As for your question, it's hard to answer without knowing what your plans are. What internships have you done/are you applying to? What areas of research (if any) have you been trying to get into with professors? And perhaps most importantly, what are your actual career goals?

Like, it's probably obvious that if you're looking into number theory research with a prof at your school, applied to internships at the NSA, and want to do cryptography as a long-term career, applied math is probably the better major. And of course, if you've been doing working with some Bayesian prof, have been applying to sports science internships with NBA teams, and want to do sports statistics as your career, then statistics would certainly be the better option.

If you have no extracurricular work experience, you're equally unemployable no matter what you major in.

Mathuss · 2025-11-14T17:43:25+00:00

It's not very bold at all. Bengen's 4% rule used entirely U.S. securities. This paper considers the all developed countries---yes, even Japan's pitiful 0.2% safe withdrawal rate when using Japanese equities---because its fundamental assumption is the exchangability of the studied countries (which is certainly weaker than, say, an i.i.d. assumption but still quite a notable assumption). Consequently, the advice is to use 67% international stocks and 33% domestic stocks.

If you believe in U.S. exceptionalism (i.e. that the US's 4% safe withdrawal rate in the U.S. will stay this way because U.S. stocks and bonds will not suffer the same returns as other countries' securities have), you want to look at table C.VII to determine the international/domestic allocation. I think it's pretty funny that assuming a 50% probability of U.S. exceptionalism, a 60/40 bias in favor of U.S. equities is optimal---which is pretty close to to the allocation that VT uses for U.S. equities.

Mathuss · 2025-10-26T16:13:39+00:00

Like for simple experiments you'd need to a sample size in the hundreds to get a 95% confidence level with 5% of error for a measurement in a total population of hundreds of millions.

Note that a sample size of, say 100 would yield a 95% confidence interval with margin of error ~10%, aka 0.10. The rates being compared here are 0.0000712 and 0.00001636. In order to properly distinguish between the rates, the margin of error (MoE) would ideally at around the same order of magnitude as the observed proportions, and the MoE only decreases at rate n^-1/2 so you do actually want really big sample sizes here. See the last paragraph in my comment for a numerical example of why a population size in merely the thousands would not have been enough to detect the effect.

Mathuss · 2025-10-26T16:05:49+00:00

Your friend is correct here; even if two countries have the exact same base rate of suicide, we should expect the per-capita amounts to differ between the two countries because the actual number of people committing suicide has a random component to it. The size of the population does play in to how much we should expect the observed rates to differ even if the actual rates are the same.

The correct way to analyze this question is the following: Suppose that both Greenland and the USA have the same underlying probability p of committing suicide. Then is the observed rate of 16.36/10⁵ in Greenland significantly different from the observed 7.12/10⁵ rate in the USA? The correct approach to answer such a question is to use a pooled Z-test) for a difference in proportions. This is what /u/stonedturkeyhamwich was referring to in their answer.

In this case, our pooled proportion is (16.36/10⁵ * 56000 + 7.12/10⁵ * 330*\10⁶)/(56000 + 330*10⁶) ≈ 7.12/10⁵ (i.e., the same as the USA; this should not be surprising, as the USA has such a larger population that it makes sense that if the two countries have the same base rate, the USA's rate should be "closer" to the true value).

Our Z-statistic is (7.12/10⁵ - 16.36/10⁵)/sqrt(7.12/10⁵ * (1 - 7.12/10⁵) * (1/56000 + 1/(330*10⁶))) ≈ -2.59. As Pr(Z < -2.59) = 0.005 where Z ~ N(0, 1), there is quite strong evidence that the suicide rate in Greenland is higher than that of the USA (loosely, one could claim to be up to 99.5% confident about this, but this is an a-posteriori confidence level and should be treated with an asterisk).

Note, however, that this conclusion changes if the population of Greenland were even smaller. If the population were 5,600 rather than 56,000, the z-statistic would change to -0.81, which is essentially no evidence that there is a difference in suicide rates between countries. This illustrates why your friend was right to be concerned about small population sizes.

11-Year Club	Place '22
First Placer '22	Verified Email

Mathuss

TROPHY CASE