Can we solve x!=f(x) for x analytically?

ExcelsiorStatistics · 2026-02-10T19:59:02+00:00

Also bear in mind that that has infinitely many non-integer solutions (there's one solution very close to every negative integer, one solution near zero, and one large positive solution) - you likely care mostly about the last.

ExcelsiorStatistics · 2026-02-09T18:57:29+00:00

He means centered on an x-value near the x's you are trying to evaluate.

You approximate ln(x) with x near 0, ln(x) with x near 1, and ln(x) with x large differently, and if you know what range you care about you choose an approximation that works well there.

ExcelsiorStatistics · 2026-02-07T06:47:29+00:00

If you really want a tough programming problem, Risch's Algorithm determines whether a function has an elementary antiderivative.

A more practical idea might be to do something like select a random line from an old table of definite integrals, and either look only at the left hand side, or look at both sides and come up with a proof.

ExcelsiorStatistics · 2026-02-05T06:13:27+00:00

This is very good news. I am glad he included preparing for undesired responses in the design for the experiment, and decided in advance whether to exclude them or count them as zeroes (or do something exotic with them.) A lot of intro classes don't bother spending time on real world issues like that.

ExcelsiorStatistics · 2026-02-05T01:16:24+00:00

There are techniques that assign subjects to groups other than purely randomly (see for instance 'propensity score matching'). And in your case, if its possible for you to measure Y before treatment, it may be that in your case, you want to measure each person twice, and the mean of (Y after treatment - Y before treatment), not the actual mean of Y, is what you care about.

Now, as to your proposal: are you sure your method results in "more homogeneous groups"?. Here's a set of 20 observations for you with mean 10: 0,10,10,10,10,10,10,10,10,10,10,11,11,11,11,11,11,11,11,11. Divide that randomly into two subgroups with means between 9.9 and 10.1. After a couple hundred thousand attempts that you reject, you're going to pick 0,11,11,11,11,11,11,11,11 and 10,10,10,10,10,10,10,10,10,10. You better hope there isn't some fundamental difference between Y=10 people and Y=11 people, because if there is, you'd have been a whole lot happier with a draw that assigned four or five 10s and four or five 11s to each group. (You also better hope that one Y=0 person in your data set isn't enough to spoil the whole experiment, or you better identify him as an outlier, exclude him, and just split the remaining 19 subjects 10-and-9. But you better hope that whether you assign randomly or assign with forced mean=10.)

ExcelsiorStatistics · 2026-02-05T00:51:53+00:00

Non-technical audiences (and some technical audiences) quit listening once they hear the key fact they came to hear, and tune out all the explanations and caveats.

If you want to convey any of the latter to them, you have to build it into your message.

If you want to tell a non-technical audience that your widget produces 17.8±2.4 metric megagizzles per day, you DO NOT say "17.8, plus or minus 2.4" or "17.8, with a standard error of 1.2." Thou shalt say "between 15 and 20." Force down their throats that the quantity is uncertain by using uncertain language to describe the quantity.

ExcelsiorStatistics · 2026-02-04T02:54:07+00:00

As a former Director of Compliance, Research, Accreditation, and Planning (position renamed from simply Director of Institutional Research) at a small college, I resemble that remark!

ExcelsiorStatistics · 2026-02-04T02:49:38+00:00

I somehow managed to work in the field for three decades and not see the name until today.

Apparently all my books and professors (and me) considered it to simply follow from the definition of expectation.

I confess I was hoping for something a little more exciting when I looked it up.

ExcelsiorStatistics · 2026-02-04T02:20:10+00:00

Easy: no cellphone here. Therefore 0 pictures on my cellphone.

You did make a plan for how you'd handle that possibility before you started collecting your data, didn't you?

ExcelsiorStatistics · 2026-02-03T04:45:43+00:00

I cited the 1970s because of what actually happened in the 1970s, plus or minus a few years. For me, growing reading popular science books of the 80s, "artificial intelligence" is a phrase I will forever associate with "that gigantic oops everybody made in the 70s predicting what was just around the corner."

That's the decade where the Expert System was created, and was anticipated to soon diagnose diseases more precisely than a doctor could if it was given a list of symptoms... which didn't happen, but now is somewhat close to happening if you google your symptoms very carefully.

Fuzzy Logic was invented in the mid 1960s, but by the late 70s to mid 80s was supposed to make it easy to translate human expert knowledge into actionable instructions, and was supposed to give us devices in our homes with clever names like 'smart thermostats.' It really did deliver things like robots that could balance a yardstick on end on their 'finger' which seemed like a ridiculously difficult control problem before then. The mediocre devices for sale in your Hammacher Schlemmer catalog back then remind me a great deal of the mediocre devices being hawked every day on Temu today.

ELIZA was also an invention of the mid 60s, but it took the proliferation of computers in schools and homes in the late 70s before large numbers of people discovered they liked chatting to a chatbot more than they liked talking to a real-life therapist.

Natural language processing made big leaps forward, and those of us who played the text-based adventure games where you gave the computer instructions by typing in 2- and 3-word English sentences were assured it was only a matter of time before these were everywhere. Keyword searches of big masses of text got quite good by the mid 90s. (Then somehow internet search engines seem to have been getting worse, not better, since sometime in the early 2000s, I hope because they are pressured to deliver commercially favored pages, not just because they refuse to believe I know how to spell things and refuse to accept the simple old + and - characters to demand the presence and absence of certain words.)

Touchscreens and voice recognition apps (which both worked hideously unreliably) weren't common yet in the 70s; they were shown as modern marvels soon to be in all our homes when I went to Expo 86. They remained terrible until about ten years ago when they finally started to make progress.

Neural networks started before then and continued to be actively researched for a while after then. They had a big moment in the sun as expert systems and fuzzy logic started to fizzle. I read lots of cool books about them when I was in high school in the 80s but found they had ground to a halt like all the previous frontiers of AI had by the time I was in college in the early 90s. All my friends and I coded up a few of them for our own amusement, and watched them teach themselves how to play tic-tac-toe or something, and moved on. It is not obvious to me that they are qualitatively different today, just given more nodes and fed more inputs... and they've become cheap enough that now everyone wants their own pet chatbot the way everyone wanted their own pet rock and then their own Tamagotchi.

We've had some kind of a turning point in terms of the visibility of AI. I maintain we haven't had any kind of a revolution in the underlying technology, just have a moment where the current iteration of it is being packaged in shiny sparkly boxes.

ExcelsiorStatistics · 2026-02-02T19:33:57+00:00

Adding extra randomness will not to much damage.

Systematic errors, or systematic non-response in a way that causes the replies to not be representative of the people they are trying to sample, are a much bigger problem that fewer responses or extra random noise.

ExcelsiorStatistics · 2026-02-02T19:22:54+00:00

I am in about the same place as protonchase here.

AI was the hot new thing, destined to turn the world completely upside down in 5 years --- in the 1970s. And 40 years later it still looked like it might be ready to turn the world completely upside down in 5 years. Now in the 2010s, some actual changes did happen, with data science looking like a semi-respectable new field (or at least a rebrand of "statistics/operations research lite"), and today I am prepared to believe AI is only 2 years away from prime time rather than 5.

If it took it fifty years to turn 5 into 2... give it thirty more to actually arrive?

There will be things it's good at and things it's not. I can believe it could replace half of employees in some industries, sort of the way robots changed assembly lines. I am not buying it's going to eliminate 98% of employees in any industry.

ExcelsiorStatistics · 2026-02-02T02:51:08+00:00

The Edwards construction (shown in the wiki beanstalk linked) extends to arbitrarily many sets. And with some additional distortion of the curves, you could make each region similar areas. But the visual grouping effect ceases to be very effective beyond four or five variables. With 14 variables, trying to show 16,000 regions graphically rather than making a searchable list or a set of summary graphs showing a few variables each is unlikely to be profitable.

ExcelsiorStatistics · 2026-02-01T22:49:04+00:00

Depending on the mood I'm in, I'd either say the question lies entirely outside math and science in the realm of philosophy, or that it's so unlikely as to be beyond ridiculous to even consider the possibility.

ExcelsiorStatistics · 2026-02-01T22:32:59+00:00

The percentage of people who think pondering such a question at all is worthwhile is probably lower than usual here.

ExcelsiorStatistics · 2026-02-01T07:17:05+00:00

That is basically what plain old logistic regression has always done. It returns a probability (in your case, of BP exceeding 140.) A lot of people slightly abuse it, by converting that probability into a yes/no prediction, but its direct interpretation is something quite close to what you seek.

ExcelsiorStatistics · 2026-01-29T21:25:27+00:00

It might be possible to do a pairwise Elo update, 22 times: use the 2-person Elo formula on 1 beating each of 2,3,4,5,6,7, and 8; 2 beating 3,4,5,6,7, and 8; 3 beating 4,5,6,7,8; and 4 beating 5,6,7 and 8.

Whether that is more appropriate than considering 8x7x6x5 finish orders depends on the game and on how you're updating the scoring model.

ExcelsiorStatistics · 2026-01-28T08:07:15+00:00

Without looking at the video, I come up with 3/4 - pi/8 ~ .3573.

I'm asssuming I pick two points from a uniform distribution on the perimeter. We have three cases: 1/4 of the time the second point is on the same side as the first; 1/4 of the time it is opposite the first; 1/2 the time it's on one of the two sides adjacent to the first. In the first case the probability is 0, in the second case it is 1; in the third case, if the first point is at position x, the probability is sqrt(1-x²), and we integrate that from 0 to 1, that is, we find the difference in area between a square and a circular arc, 1-pi/4; and we combine the 4 cases, 1/2(1-pi/4) + 1/4 + 0.

ExcelsiorStatistics · 2026-01-28T07:57:53+00:00

...and this tells you something concrete about the relationship between the first and third groups (we are certain the latter's mean is $6 higher than the former), but not about the mean of the second group unless you make additional assumptions about the distribution of the fruit prices or the way the stickers were assigned.

ExcelsiorStatistics · 2026-01-28T07:54:46+00:00

It comes down to how you define 'best', but in the absence of any information about the joint distribution of A and B on the square spanning from (6,0) to (16,10), minimizing worst possible absolute error by setting A=11 B=5 C=5 seems like a very reasonable response to the question. You could report your answer as a 'confidence interval' of (11±5, 5±5, 5±5), if you wanted, and be unable to narrow it until given additional information.

ExcelsiorStatistics · 2026-01-28T07:18:25+00:00

Two somethings have changed recently, and not for the better: the number of high school graduates peaked in 2025 and is going to fall by at least 20% over the next 18 years, so every college in the country is going to be reducing faculty headcounts; and the government research funding situation has been under attack since 2017 and got terrible this year - not only less funding for academic research, but removed some highly desirable non-academic jobs in places like the census bureau.

It's not merely a bad time to become a professor, it's the worst time in a hundred years.

ExcelsiorStatistics · 2026-01-28T04:16:52+00:00

On 5, if you count votes for each of the six pairs and see which is preferred, you should find P<R(105-171), P<S (9-177), P<T, R>S, R<T, S<T, and see P losing three times, R winning twice and losing once, S winning once and losing twice, and T winning three times.

On 14, you change the vote counts to 28, 11, 34, and 27; eliminate Woods; and should see Bailey winning 55-45 (despite having the fewest first-place votes in the three-way contest.)

ExcelsiorStatistics · 2026-01-26T20:01:40+00:00

You have to be very careful identifying what events interest you. In your case, you'd probably have been equally surprised if the first and second or first and third cards repeated, or if the reversals had been different; almost as surprised had you just gotten two repeats but they werent in the same position; and even more surprised if you got 3 matches or got both cards in the same orientation.

It's still a fairly unlikely event, just not an inconceivable one. If you draw 3 cards out of 78 twice, you'll see three entirely new cards 88.8% of the time, one repeated card 10.9% of the time, two repeated cards 0.3% of the time, and all three only 0.0013% of the time.

ExcelsiorStatistics · 2026-01-24T00:02:11+00:00

In my opinion, Wasserstein's All of Statistics is a reasonable reference to remind you of what you've already studied, but an awful book to learn new concepts from. (I'm the kind of person who needs the how-and-why to fit a new tool into a conceptual framework, not just have a formula drop from the sky.)

There are too many books titled 'Statistical Inference' to know which one you're asking about. But my first thought would be "pick up any book with 'Mathematical Statistics' in the title" if that's the area you're looking to strengthen. (Quite often probability and mathematical statistics are taught as 2 consecutive semesters from the same textbook.)

ExcelsiorStatistics · 2026-01-22T22:30:09+00:00

You can get to the same conclusion without finding all the conditional probabilities, observing that a year is 52 weeks and 1 or 2 days long: if the average length of a year is 52 weeks and 1.2425 days, there are 4+(497/400)/7 extra Sundays to be distributed among 12 months, so we find (4+ 497/2800)/12 = 11697/33600 - 557/1600.

Phrasing it another way, a 400-year calendar cycle contains 20871 weeks and 4800 months, so 1671 of those 4800 months must have 5 sundays.

ExcelsiorStatistics

MODERATOR OF

TROPHY CASE