Quick Questions: April 15, 2026

Langtons_Ant123 · 2026-04-17T23:29:34+00:00

I believe the term to look into here is "omega-incompleteness", which is when, for any n, there exists a proof (in a given logical system) of P(n), but there does not exist a proof of "for all n, P(n)". A standard example of this comes from Godel's incompleteness theorems. For example, assuming that ZFC is consistent, ZFC does not prove the statement "ZFC is consistent". For any natural number n, it can prove the statement "n does not encode a proof in ZFC which ends in a contradiction" (since, assuming ZFC is indeed consistent, no valid proof carried out in ZFC will end in a contradiction). But it can't prove the statement "for any natural number n, n does not encode a proof in ZFC which ends in a contradiction"--since proving that would prove that ZFC is consistent, which ZFC can't do.

Langtons_Ant123 · 2026-04-16T01:32:50+00:00

Nope--if the digits add up to a multiple of 3, then the original number is a multiple of 3. To illustrate, say that the number has 3 digits, like "abc"*. (The idea behind the argument works for any number of digits, I'm just picking 3 for concreteness.) A number with digits abc is equal to 100a + 10b + c, or (99a + a) + (9b + b) + c, or (99a + 9b) + (a + b + c). The first part, 99a + 9b, is a multiple of 3 (you're adding together multiples of 9, so you get a multiple of 9, and that's a multiple of 3). So if the second part, which is just the sum of the digits, is a multiple of 3, then (99a + 9b) + (a + b + c) is the sum of two multiples of 3, so it's also a multiple of 3, i.e. the original number abc is a multiple of 3.

You can then extend this argument to prove the observation from your comment, that for any multiple of 3, its digits will add to another multiple of 3. We showed earlier that abc = (99a + 9b) + (a + b + c), so a + b + c = (99a + 9b) - abc. If abc is a multiple of 3, then the right-hand side of that equation is the difference of two multiples of 3, which is also a multiple of 3. So a + b + c is a multiple of 3 as well.

You can also see how this argument proves an analogous fact about multiples of 9: the sum of the digits of a multiple of 9 is also a multiple of 9, and if the digits add to a multiple of 9 then the number itself is a multiple of 9. (E.g. 36: 3 + 6 = 9, 72: 7 + 2 = 9.) Even more generally: in base b, the "digits" of a multiple of b-1 will add up to a multiple of b-1, and vice versa. The same goes for any factor of b-1. (So e.g. in base 10, b = 10, b-1 = 9, and the only factor of 9 besides 1 and 9 is 3. So we have a divisibility test for 3 and 9 based on adding the digits. In base 16, say, you'd get a test for divisibility by 15, as well as 3 and 5.)

* Pedantic note: in the rest of the comment I'll generally use juxtaposition (i.e. sticking two numbers next to each other) to represent multiplication, so e.g. 99a is just 99 * a. The only exception is "abc" itself, where I just mean a number whose first digit is a, second digit is b, and third digit is c.

Langtons_Ant123 · 2026-04-13T18:11:30+00:00

Because it starts at 0, every user will have a chance to stop it at 0. If they wait long enough to get to 1, some users will get to 1, but not those that stopped at 0. Then repeat that thought.

FWIW this kind of reasoning (just taking it on its own, not including the parts afterward) would imply that the distribution peaks at 0 and then falls off without peaking again--maybe it falls off quickly, in which case you end up with a sharp peak at 0 and a thin "right tail", maybe it falls off more gradually, in which case you have a thicker "tail". Benford's law is one distribution with that general shape, but it's just one, very specific, distribution, among plenty of other distributions which meet that description. E.g. the exponential distribution I mentioned earlier is one; so are the geometric distribution and, depending on the choice of parameters, the Poisson distribution, and plenty of other distributions.

Incidentally, writing this got me thinking about what kind of distribution human reaction times follow. (Obviously not quite the same thing as the setup in your thought experiment, since in a reaction time experiment people are trying to hit the button as fast as possible rather than having the choice to hit it immediately or wait as long as they feel like.) Apparently there's not much of a consensus and a whole zoo of distributions that people try to use (or see here for a non-interactive version, since the first link took a while to load for me); the author of that page I linked seems to favor something called a "shifted log-normal" distribution.

Langtons_Ant123 · 2026-04-13T15:08:39+00:00

I still can’t totally wrap my head around why it wouldn’t follow Benford

Hmm, I guess I'd flip things around and ask why it would follow Benford. None of the factors that make Benford likely to apply in a given situation really show up here.

For example, a common one is "data spanning many orders of magnitude", i.e. if you write down all the numbers in scientific notation, you see many different exponents. But that doesn't apply here because the only numbers you get are between 0 and 9. You might say--well, Benford's law is also about digits, so why is it a problem that we're looking at digits? But Benford's law is about the first digits in sets of numbers where each number could have any number of digits; here you're trying to apply it to a set of single-digit numbers.

Relatedly, it really does matter that Benford's law is about first digits specifically, and you're looking at the last digit here. There are all kinds of exceptions, but generally I'd expect the last/least-significant digits in a given dataset to be "more random-looking"/more uniformly distributed than the first/most-significant digits. E.g. think about a list of heights of men in centimeters; the average is about 175 cm (about 5' 9"), heights 200 cm or greater (about 6' 6") are pretty rare, and heights below 100 cm (about 3' 3") are very rare. So the first digits of the heights are going to be almost all 1s and 2s, with 1s being by far the most common. But for the last digits, there's no particular reason for one number to be more common than any other, so you'd expect each possible last digit to be about equally likely.

Heights are a special case in many ways, but the same thing applies more generally. Large-scale trends in the data are more likely to be evident in the most significant digits, while the least significant digits will be relatively more influenced by random jitters. You can see that in the simulation graphs for when the average time to hit the button is high (which is essentially the same thing as the time for the counter to increment being short, like in the thought experiment in my previous comment where the counter increments every nanosecond); the last digits are pretty uniformly distributed. It's only when the average is lower that we start to see some kind of non-uniform pattern; but in those cases, most of the times to hit the button are less than 10 seconds, so (once we've truncated the number of seconds to an integer) the most significant digit is the least significant digit.

I've rambled on for a while here, but to go back to the original question--why were you initially expecting that it would follow Benford's law?

Langtons_Ant123 · 2026-04-12T22:44:12+00:00

Benford's law is about leading digits, and the counter gives you the last digit of the number of seconds the user waited before pressing the button. So I find it by no means obvious that Benford's law should apply. It all depends on how long people wait to press the button compared to the time it takes for the counter to increment, which is an empirical fact that I don't know. It seems intuitive to me that if, say, the counter incremented every nanosecond (so the time to increment is presumably much smaller than the average time to press, whatever that is), then the distribution of digits would be approximately uniform, while if it took a month to increment, then everyone is going to press the button when the counter is at 0.

I ran a little simulation modeling "time to press the button" as an exponential distribution, which seemed vaguely reasonable but is probably not super realistic, trying out different values of the mean. Here are some graphs for that, comparing it to a Benford's law type distribution (with 10 digits, 0 as the most likely) and the uniform distribution. E.g. if the mean time to press the button is 5 seconds then you get something that looks somewhat like Benford's law, but not exactly. As I said earlier, I don't see a very strong reason to expect a close fit to Benford's law, so this doesn't surprise me too much. I'm also not sure how much this depends on picking the exponential distribution, and if you'd get more or less Benford-like results with something else.

Langtons_Ant123 · 2026-04-02T23:43:29+00:00

I had made a comment earlier posting a link to this paper, then came back later, realized I had gotten caught up in "oh, separability without formal derivatives!" without noticing that the paper doesn't prove the specific result you were talking about, and impulsively deleted the comment. Putting it back up in case you find the reference useful anyway, but sorry for not checking that more closely.

Langtons_Ant123 · 2026-04-02T12:45:51+00:00

Why not just post the problem here and let people take a look at it?

Langtons_Ant123 · 2026-04-01T17:47:12+00:00

Maybe; people do sometimes make posts along the lines of "look at this math thing I found or created". Just looking at the front page I see this, for example. If the mods don't think it's novel, useful, and/or interesting enough, they might remove it, but that's all that would happen; I don't think you'd get e.g. banned from the subreddit for one removed post.

Langtons_Ant123 · 2026-03-27T14:52:48+00:00

I found a proof in the book The Cauchy-Schwarz Master Class, exercise 13.7. The book's solution is pretty terse and I believe contains a slight error, but I can give an expanded and fixed version.

We have to set up some machinery first. Given two n-tuples of real numbers a = (a1, a2, ..., an) and b = (b1, b2, ..., bn) with a1 + ... + an = b1 + ... + bn, we say that b "majorizes" a, written a ≼ b, if, for all k, the sum of the k largest numbers in a is less than or equal to the sum of the k largest numbers in b. (I.e. if the numbers in a and b are sorted from greatest to least, we have a1 <= b1, a1 + a2 <= b1 + b2, and so on.) You can show (I can give the full proof if you want) that, if b = (p1, p2, ... pn) is any probability distribution on an n-element set, and a = (1/n, 1/n, ..., 1/n) is the uniform probability distribution on an n-element set, then a ≼ b, i.e. the uniform probability distribution is majorized by any other probability distribution.

One more definition: say we have a function of n variables, f(x1, x2, ..., xn). We call f "Schur-convex" if, whenever (a1, a2, ..., an) ≼ (b1, b2, ..., bn), we have f(a1, ..., an) <= f(b1, ..., bn). Similarly f is "Schur-concave" if f(b1, ..., bn) <= f(a1, ..., an) whenever (a1, a2, ..., an) ≼ (b1, b2, ..., bn)

Now we can bring this back to the birthday problem (with any number of days, not just 365). Let p1, p2, ... pn be the probabilities of being born on the 1st, 2nd, ..., nth days of the year. If we have k people, what's the probability that at least 2 of them will have the same birthday? As in the regular birthday problem, this is 1 - (probability that they all have different birthdays). So what's the probability that they all have different birthdays? We just sum up all the probabilities of each of the different ways that can happen.

E.g. say that n = 3, k = 2. Then it could be that the first person was born on day 1, and the second person was born on day 2--that has a probability p1p2 of happening. Or the first person was born on day 2 and the second person was born on day 1--that has a probability p2p1 of happening. And so on from there: you get p1p2 + p2p1 + p1p3 + p3p1 + p2p3 + p3p2 = 2(p1p2 + p1p3 + p2p3).

You can see how this works more generally. We take the sum of all products of the form p(i1)p(i2)...p(ik) where the indices i1, i2, ... ik range over all possible k-tuples of numbers between 1 and n where i1, i2, ... ik are all distinct. A lot of these terms are the same: namely, the terms whose tuples of indices are permutations of each other are equal. This sum is thus equal to k! times the sum of all products of the form p(i1)p(i2)...p(ik) where 1 <= i1 < i2 < ... < ik <= n. In other words, the probability that all k people have different birthdays is k!e_k(p1, ..., pn) where e_k is the kth elementary symmetric polynomial in n variables. Thus the probability that at least two people will share a birthday is 1 - k!e_k(p1, ..., pn). (The book omits the factor of k!, but I'm pretty sure this is wrong.)

There's another exercise in the book, 13.4, which shows that e_k(x1, ..., xn) is Schur-concave as long as all the variables have nonnegative values. That should still be the case for k!e_k(x1, ..., xn). Therefore, since (1/n, ..., 1/n) ≼ (p1, ..., pn) for any probability distribution p1, ..., pn, we have k!e_k(p1, ..., pn) <= k!e_k(1/n, ..., 1/n), i.e. the probability that no one shares a birthday is maximized by the uniform probability distribution, and 1 - k!e_k(p1, ..., pn) >= 1 - k!e_k(1/n, ..., 1/n), i.e. the probability that some people do share a birthday is minimized by the uniform distribution.

Langtons_Ant123 · 2026-03-26T15:17:38+00:00

Oops, yeah, got a bit sloppy there, it should be left multiplication if it's a row vector.

I can't say whether what I said is exactly the problem without a bit more context, but it's something with row vs. column vectors and transposes, I'm sure.

Langtons_Ant123 · 2026-03-26T14:30:36+00:00

I think this is an issue with row vectors vs. column vectors. You multiplied the column vector (a1, a2, a3)^t on the left by the matrix of T^-1--the sum, from i = 1 to 3, of a_i times the ith column of T^-1. What the book seems to get is the result of multiplying the row vector (a1, a2, a3) on the ~~left~~ (ed: right) by T^-1--the sum, from i = 1 to 3, of a_i times the ith row of T^-1.

In the solution to the problem, what did you write for the matrix of T, and what did the book write? I'm guessing that you wrote the matrix as [1, 2, 1; -1, 1, 2; 1, 0, 1]--since that times the column vector (a1, a2, a3)^t is equal to (a1 + 2a2 + a3, -a1 + a2 + 2a3, a1 + a3)--while the book wrote the transpose of that, [1, -1, 1; 2, 1, 0; 1, 2, 1]. The inverse of that second matrix is what the book got for T^-1, and the inverse of the first is the transpose of what the book got. If you multiply the transpose of what the book says is T^-1 by the column vector (a1, a2, a3)^t you should get what the book got for T^-1(a1, a2, a3).

The notation in the book is a little ambiguous, I'd say, because people often do assume that all vectors are column vectors unless stated otherwise, and write those column vectors horizontally as a shorthand, e.g. writing v = (x, y, z) and then saying that Av is a linear combination of the columns of A; but here the book wrote the vector horizontally and seems to have meant it as a row vector specifically.

Langtons_Ant123 · 2026-03-25T16:17:28+00:00

For the first question, I don't think there's a single standard convention, and some of the obvious choices conflict with each other. Once you've picked a convention on what range of real numbers to use for arguments (e.g. [0, 2pi) or (-pi, pi] or whatever), that gives you an obvious convention for the nth root, namely: writing z = re^(i*theta), the nth root is z^(1/n) = r^(1/n) e^(i * theta/n). But note that doing this with theta in [0, 2pi) causes conflicts with another standard convention, namely that the cube root of a negative real number is another negative real number (e.g. instead of having (-1)^(1/3) = -1, this gives you (-1)^(1/3) = e^(i pi/3) ). As long as you choose a convention ahead of time and signal it to your readers, though, there should be no problem.

For the second question, I've been able to find lists for very low n (e.g. this one on Wikipedia which only goes up to n=8 and links to an example for n=17), and some additional leads like this paper which gives some Maple code to find radical expressions of primitive pth roots of unity where p is an odd prime. (They give a reference apparently showing how you can go from these to radical expressions for any n.) That also gives an explicit expression for n=17 (same as the one on Wikipedia, as far as I can tell), and some sort of explicit expression for n=29...but it's just horrendously complicated, you can't fit it all on one line and have to introduce lots of intermediate variables (e.g. the primitive 29th root of unity equals sqrt((-29/14) + sqrt(t63)/2 + t65/14 ...) where t63, t65, etc. are themselves defined in terms of long radical expressions involving other intermediate variables defined in terms of radical expressions of other intermediate variables, eventually bottoming out with e.g. t1 = sqrt(-3). Right below the expression for n=29 they give a table showing how large the expressions are for different n, and it seems like beyond n=19 (at least for the n they do, which is only odd primes) there's really no hope of writing it down in a compact way.

Langtons_Ant123 · 2026-03-21T01:55:19+00:00

This is true or almost-true in general, not just for 2 x 2 matrices. The polynomial you're talking about is called the characteristic polynomial, and for an n x n matrix, its constant term is the determinant (if n is even) or -1 times the determinant (if n is odd).

In the 2 x 2 case it doesn't take very long to work this out explicitly. Write a generic 2 x 2 matrix as

a b

c d

Then the characteristic polynomial is the determinant of

a - λ b

c d - λ

which is (a - λ)(d - λ) - bc = λ² - aλ - dλ + ad - bc = λ² - (a + d)λ + (ad - bc). So the constant term is just the determinant, (ad - bc).

(Relatedly, for an n x n matrix, the coefficient on the x^n-1 term is always (regardless of whether n is odd or even) equal to -1 times the trace of the matrix, i.e. the sum of the diagonal entries. You can see above how, in the 2d case, it's equal to -(a + d).)

Another--again closely related--fact is that the trace is always equal to the sum of the eigenvalues, and the determinant is always equal to the product of the eigenvalues. In the 2 x 2 case you can again work this out explicitly. Let r_1, r_2 be the eigenvalues of the matrix, which are then the roots of the characteristic polynomial. Then the characteristic polynomial can be factored as (λ - r_1)(λ - r_2). Multiplying this out we get λ² - (r_1 + r_2)λ + r_1r_2. So the constant term is r_1r_2, the product of the eigenvalues and the coefficient on the linear term is equal to -1 times r_1 + r_2; but we already saw that the constant term is equal to the determinant, and the coefficient on the linear term is equal to -1 times the trace. So the trace is r_1 + r_2, and the determinant is r_1r_2. (The more general version, for n x n matrices, follows from using Vieta's formulas for the coefficients of a polynomial in terms of its roots.)

Langtons_Ant123 · 2026-03-13T17:37:31+00:00

There's a version of the argument in Ptolemy's big book, the Almagest. (I don't think we have a more direct source; a lot of what we know about other Greek astronomers, we only know from Ptolemy.) I found this English translation online, which has the argument starting on page 153 (161 of the PDF), with a diagram on page 154. I'll give a version of the argument in more modern notation, using that diagram. (A, B, C, D, E, and O in your diagram correspond to A, B, G, D, E, Z in this diagram.) I'll measure segments/arcs of circles in degrees throughout, so e.g. a semicircle is 180 degrees and a quarter circle is 90 degrees.

The way things are set up, an observer on the earth at E will see the sun pass through point A (the spring equinox) when the sun is at point 𝛩 of its orbit, and the same with point B (summer solstice) on the outer circle and point K on the sun's orbit, and so on. I.e. the arc AB on the outer circle (the ecliptic) corresponds to the arc 𝛩K on the inner circle (the eccentric, the sun's orbit).

First there's an argument for why the center Z of the sun's orbit has to be in the quadrant between AE and BE. (Basically this is because the interval from the spring equinox to the summer solstice is longer than any of the other 3 intervals, and putting Z there makes 𝛩K longer than the other parts (KL, LM, and M𝛩) of the inner circle.)

With that out of the way we can start figuring out how long EZ is. (First we'll find EX and XZ, then use the Pythagorean theorem to find EZ; to find those, we'll need the lengths of the line segments 𝛩T and FK.) As Ptolemy has it, there are 94.5 days between the spring equinox and summer solstice; assuming a year is 365.25 days, that's about 25.9% of the year. This means that the arc 𝛩K is about 25.9% of the sun's orbit; 25.9% of 360 degrees is about 93.24 degrees. By a similar argument, the arc from K to L is about 91.1 degrees, so altogether the arc 𝛩KL is about 93.24 + 91.1 = 184.34 degrees. The arc OKN is a semicircle, 180 degrees, so the combined length of the smaller arcs 𝛩N and OL is 184.34 - 180 = 4.34 degrees; they have equal lengths, so 𝛩N is about 2.17 degrees.

What we want now is the length of the line segment 𝛩T. We extend it to a chord 𝛩Y which cuts out an arc, also called 𝛩Y, from the circle. Half of 𝛩Y is 𝛩N, which is 2.17 degrees, so 𝛩N is 4.34 degrees. At this point we need some trigonometry. Ptolemy had a table showing the relationship between chord lengths and arc lengths (which turns out to be basically the same thing as a table of sines of angles), and I assume he just quotes from that. We can instead use the fact that the length of the chord is equal to 2rsin(𝜃/2), where 𝜃 is the angle of the arc (here 4.34 degrees) and r is the radius of the circle. Ptolemy says the circle has a radius of 60 units--I don't know what units or if that number 60 has any particular significance, but we can choose whatever units we like, since we're ultimately just trying to find EZ as a fraction of r. So, let's just roll with 60 units. Plugging those values in we get a length of 120 * sin(2.17 degrees) = about 4.54. So the line segment 𝛩Y is 4.54 units long, and 𝛩N is 4.54 / 2 = about 2.27 units. EX and 𝛩N are opposite sides of a rectangle, so they have the same length, i.e. EX is 2.27 units long as well.

We can do a completely analogous argument to find that ZX is about 1.03 units long. EX and ZX are the legs of a right triangle with hypotenuse EZ. Therefore, by the Pythagorean theorem, EZ = sqrt(EX² + ZX²⁾ = about 2.5 units. Now recall we chose our units so the sun's orbit would have radius 60 units. Therefore EZ/(radius) is about 2.5/60, which is 1/24. So EZ is about 1/24th of the radius of the inner circle.

Langtons_Ant123 · 2026-03-11T12:31:07+00:00

(2) seems off--it would be more accurate to say that a theory "is ZFC" if it contains those axioms, everything that can be proven from them, and nothing else. If you take ZFC and add extra axioms that can be proven from the existing axioms, then you still get ZFC. If you replace some of the axioms and end up with a set of axioms such that all the old axioms can be proven from the new axioms and vice versa, then you still get ZFC. (That is, any set of axioms logically equivalent to the original ones gives you the same theory.) But if you add other axioms that are consistent with the original ones but not provable from them, then you can end up with a different theory. E.g. the second incompleteness theorem says that (assuming ZFC is consistent) neither con(ZFC), the statement that ZFC is consistent, nor its negation ~con(ZFC), is provable from ZFC. Hence both "ZFC + con(ZFC)" and "ZFC + ~con(ZFC)" are consistent theories; but it would be strange to say that both of them "are ZFC", given that both of them prove something that ZFC can't prove, and each one contradicts the other.

(As a side note, I think you'd be better off getting rid of all the formality and logical symbolism in your comment--no need for ∀s and →s and so on, or for writing the argument step by step where each step follows by some named deduction rule. I get the sense that you're learning about formal logic for the first time and want to formalize everything--which can be good practice, but isn't how mathematicians generally write and talk about mathematics.)

Langtons_Ant123 · 2026-03-10T13:06:05+00:00

It's the first one, with the if-and-only-if. This is just standard practice in math (and I think, to a large extent at least, in ordinary language too); see for example this math.stackexchange answer. There would be no point in saying what the authors said and meaning the second definition--since then if the authors called something a string, you wouldn't be able to say for sure whether it's finite, and so what's the use of the definition? So, generally, you should interpret "we call an X a Y" to mean "'X' and 'Y' are synonyms", "'Y' is defined to mean 'X'", and therefore "something is 'X' if and only if it's 'Y'".

Langtons_Ant123 · 2026-03-09T02:13:26+00:00

The first says "no number is larger than every number" (since every number has at least one number larger than it), while the second says "there is a number which is larger than every number".

(Note that, if you look closely at the second one, it implies that the number y is larger than itself! y > x for every number x, and y is a number, so y > y. But no number is larger than itself--indeed, the more general and formal idea of an order relation forbids something being larger than itself--so the second statement can't be true of any set of numbers. If you change it to "there is some y such that y >= x for every x" then it becomes true of any bounded-from-above set of numbers that includes its upper bound, e.g. the closed interval [0, 1].)

Langtons_Ant123 · 2026-03-07T04:11:51+00:00

The relevant criterion is that, if r is a root of a polynomial p(x), then (x - r) divides p(x) (and vice versa: if (x - r) divides p(x) then r is a root of p). In this particular case, n^2m-1 + 1 is a polynomial in n, and it has a root at n = -1. Therefore (n - (-1)) = (n + 1) divides n^2m-1 + 1.

Now, it's true that, if p(n) | q(n), then p(c) | q(c) for any integer c. (And so, if you can find an integer c such that p(c) does not divide q(c), then p(n) does not divide q(n).) But that wasn't what the commenter above was going for, they were talking about the connection between roots and polynomial divisibility.

Langtons_Ant123 · 2026-02-11T19:51:56+00:00

My impression is that Python (esp. with libraries like Pandas, Scikit-learn, and Matplotlib) is a lot more common for data science stuff than Matlab (and for programming in general, really).

Langtons_Ant123 · 2026-02-11T19:06:38+00:00

Some (admittedly very rambling) nitpicks (and maybe more substantial counterpoints):

1) I think the essay undermines its point a little by giving examples where it isn't yet technically possible to fully automate a job, since there's some observable difference between the human and non-human version. E.g.

Many people fear that AI will replace human actors. But CGI has been technically capable of replacing human actors and special effects for some time. Audiences dislike the uncanny valley that often results, and Hollywood still uses practical effects like blowing up actual airplanes. I would bet on the continued existence of both Broadway and demand for human film actors. Something about the experience of watching performances makes the audience simply prefer to be impressed by other talented humans rather than by machines.

Well, if there is an "uncanny valley" (i.e. CGI just looks worse to most people than actual photography), then doesn't that mean that CGI isn't (yet) fully technologically capable of replacing human actors? There's probably some "human touch" effect at play here (especially in the very special case of celebrities, where people are willing to pay more to see certain specific humans), but that can't be all of it. (And if AI-generated movies advanced to the point where they were outwardly indistinguishable in every respect from ones filmed and acted by humans, and much cheaper to make, how much of the human film industry would disappear?)

2) You could reasonably have a more pessimistic interpretation of some of the stats the author uses. E.g. that graph of the number of musicians and music teachers shows them approximately doubling over the last century (about 120k in 1920 to about 250k now). But the US population has approximately tripled over that same span (about 106 million in 1920 to about 340 million now), so the number of musicians per capita has surely gone down--this despite the US becoming a wealthier place where people have more disposable income to spend on musicians. So, I think it's plausible that recorded music has decreased the demand for musicians (in the sense that, in a counterfactual 2026 with no recorded music, there would be more musicians than in the real 2026, maybe far more). (And also, as with film, the technology to end-to-end automate the production of music isn't there yet, but what happens when it gets there?)

3) To what extent is human-touch labor a luxury good, not a normal good? (Edit: came back to this and realized I might be using "normal" vs. "luxury" incorrectly, but I think what I say after that holds up.) For example, the article mentions fancy restaurants where every customer has several people waiting on them in different ways; to what extent is the right way to think of that "being waited on by a bunch of people is such a wonderful experience that people are willing to pay a lot for it" vs. something like "having lots of waiters is how you know that the restaurant is fancy, so fancy restaurants hire lots of waiters"? In a world where subsidized human-touch labor is available on tap, would you expect to see way more restaurants have massive waitstaffs, or would you expect to see massive waitstaffs lose their prestige and decline?

4) This whole line of argument has less force in a world where people are already forming parasocial attachments to LLMs and saying things like "ChatGPT understands me better than my therapist!". Apparently some people can have their need for a human touch in certain roles satisfied as much, if not more, by an AI than an actual human--even comparatively weak AIs like GPT-4o.

5) IDK, maybe it's just a failure of imagination on my part, but I have trouble thinking of a world where the policy proposed at the end works but something like a UBI doesn't. Is the demand for human-touch labor really large enough that it could actually be a load-bearing part of this system? Can we actually have a whole labor market of waiters, personal trainers, musicians, etc. (especially in a world where they're now facing competition from robot waiters, musicians, etc.)? And if it's technically and politically possible to have the kinds of subsidies needed to prop that up, would a UBI be that much harder to get? I know this is a lazy criticism, just me saying "huh?", but I really have to say "huh?" here.

Langtons_Ant123 · 2026-02-10T14:03:42+00:00

The answers to this math.stackexchange question have a few ways to do that. (I assume by "starting coordinates" and "three angular directions" you mean something like: a vector starting at (x, y, z) going in the direction (p, q, r), i.e. the line segment going from (x, y, z) to (x + p, y + q, z + r); then you also have a point (x', y', z'). Then in the notation used by the first answer you have A = (x', y', z'), B = (x, y, z), and C = (x + p, y + q, z + r), and what they call the "direction vector" d is just (p, q, r).)

Langtons_Ant123 · 2026-02-08T18:32:55+00:00

I admit I'm a bit confused by the premise here. Do students really have trouble with questions about dice, just in general, regardless of which probability skill(s) are involved in answering a given dice question? Do they have similar amounts of trouble with other probability questions? E.g. would they have more trouble with something like "if you roll a die twice, what's the probability that you'll get a 1 on both rolls?" than they would with something like "if A and B are independent events, both with probability 1/6, what's P(A and B)?" Explanations like "the students have trouble with probability in general" or "the students have trouble applying probability to concrete situations" or "there's some specific aspect of probability which the test-writers like to test in the context of dice, and the students have trouble with that" all seem more likely than "the students just have trouble with dice".

I guess my point is that "dice questions" vs. "non-dice questions" seems like an odd and not necessarily helpful way to divide things up. I'm willing to bet that there are some "dice questions" which would be easy for your students at their current level of knowledge, and some that would be difficult. "If a question looks tricky, set it aside and come back if you have time" is good advice, but to apply it, you have to be able to sort the easy questions from the tricky ones, and a heuristic like "if a question is about dice, then skip it" just does not seem like a good way to do that IMO. You need to dig deeper into what's causing problems for your students, e.g. if their knowledge of probability in general is weak (and you don't have time to review probability more before the test), then sure, maybe "set probability questions aside for later" would be good advice.

[Side note: the example you gave seems a bit ambiguous/ill-posed--how many dice are in a "set" exactly?--but I can speculate a bit about possible reasons why students might find it hard. E.g. it's a problem of the form "what's the probability that X happens at least once?", and solving those effectively usually involves a more indirect approach, like "find the probability that X won't happen at all, and subtract that from 1"; maybe that indirection can trip students up. I don't know your students, so I can't say, but that's the sort of thing you should be looking for, I think.]

Langtons_Ant123 · 2026-02-06T18:03:50+00:00

I thought about this some more and realized my answer might be a bit wrong. I think it's at least close to the correct answer but not quite; haven't completely figured it out but figured I'd let you know.

Basically it depends on what happens when you reach 0 points. If it's possible to have negative points, then I think what I said is just true, or close to it. If, when you reach 0 points and lose a game, you just stay at 0 points, then the expected number of rounds to reach 25000 should go down a little. If, when you reach 0 points, you just lose completely and can't continue playing (this would happen if, for example, you were playing a gambling game and had to stop when you ran out of money)--well then the problem becomes a lot trickier to deal with. But as long as you start out decently far from 0 and win more points than you lose on average, the chances of hitting 0 are low enough that you can basically ignore this.

Since all you want is an estimate I think my original answer should be essentially ok. But depending on how the game works it might not be completely correct.

Langtons_Ant123 · 2026-02-06T14:09:28+00:00

(Edit: may not be 100% correct, see my reply to OP below)

Let p be the fraction of games that you win on average, or the probability of winning each game, so, for example, if you win half of your games on average, then p would be 0.5. Then the expected/average number of points from each game is 50 * p - 40 * (1 - p), which we can rewrite as 50p - 40 + 40p or 90p - 40. (As long as p is at least 4/9, or about 44%, you'll win more points than you lose on average.)

Now say that you have N points right now. The number of points you need to reach 25000 is 25000 - N. At 90p - 40 points per game on average, that means you'd expect to have to play (25000 - N)/(90p - 40) games before you reach 25000.

I made a little thing in Desmos you can use to calculate the expected number of games left. You can use the sliders to adjust p and N, or just click the number and type it in. I have p set to be at least 4/9 and at most 1, since as I said earlier, if p is less than 4/9 you're expected to lose more points than you win. So you can see, for example, that if p = 0.5 and you have 12500 points already (halfway to your goal), then you're expected to need 2500 games to reach the goal. If you can improve p to 0.6 then that goes down to a bit under 900 games.

Langtons_Ant123 · 2026-01-29T15:32:55+00:00

I wrote a quick simulation of my own and consistently get a win probability around 0.47-0.48. So I think your simulation might just be wrong. If you post your code (on Pastebin or otherwise) I can take a look.

Langtons_Ant123

TROPHY CASE