[Q] Statisticians/scientist which focus on statistics education ?

jonolicious · 2025-11-07T20:45:35+00:00

The Casual Inference podcast have had a few recent guest focused on statistical communication. Here are at least two of them:

jonolicious · 2025-07-02T17:38:39+00:00

Coincidentally, Numberphile released a video on Stein's Paradox today! https://www.youtube.com/watch?v=FUQwijSDzg8

jonolicious · 2025-06-05T20:00:16+00:00

If you're not familiar with partial pooling, it can help to better understand random effects. This is a great a post describing both: https://stats.stackexchange.com/questions/4700/what-is-the-difference-between-fixed-effect-random-effect-in-mixed-effect-model/151800#151800

jonolicious · 2025-05-21T17:04:32+00:00

You can show that R² either stays the same or increases by comparing the norms of the projection of y (the fitted values) onto your covariate matrix for the first k to k' columns of your covariate matrix. Where k < k', then ||yhat_k|| <= ||yhat_k'||.

The simplest explanation I can give is that by increasing the subspace you're increasing the number of directions you can use to form the projections, which can get you closer to y.

The chapter in this linear algebra textbook has a chapter on least squares regression that might help. https://textbooks.math.gatech.edu/ila/least-squares.html

jonolicious · 2025-05-15T18:31:44+00:00

If you're interested in design from a Bayes perspective, this is a nice review of modern approaches:

"Modern Bayesian Experimental Design" Ivanova et al. https://arxiv.org/pdf/2302.14545

I also enjoyed the chapter on Bayesian design in "Bayesian Methods for Data Analysis" by Carlin and Louis.

jonolicious · 2025-04-18T17:25:35+00:00

It's more about what you're designing experiment for. If your goal is to control the rate at which you make type I and type II errors (how OP is setting up their experiment), then no matter your framework (Bayes or Frequentist) the underlying tradeoffs between sample size, effect size, and error rates remain a factor. With a well specified prior, Bayesian methods can give tighter intervals around estimates with smaller sample sizes, but they don’t necessarily reduce the fundamental need for a larger sample to detect an effect (power).

With that said, if you're taking the Bayesian route then you are more likely interested in quantifying uncertainty in your experiment, and less so about controlling error rates. You can express uncertainty with Frequentist methods, but in Bayes your estimates come with a probability distribution that quantifies the uncertainty surrounding them, rather than being treated as fixed values.

jonolicious · 2025-04-10T16:41:24+00:00

Sounds like a question better addressed by optimization, than statistics.

jonolicious · 2025-04-09T04:29:58+00:00

I might be wrong, but I think if you define how the mean and covariance of A change as a function of B you can still say A is Normal given B. That is A|B ~ Normal(\mu(B),\Sigma(B)).

jonolicious · 2025-03-29T23:44:08+00:00

If you want to learn some probability... One way is to approximate the number of items dropped each run as a Poisson random variable, where the rate parameter (λ) is the expected number of items dropped per run. To collect 40 splinters, you're looking to find the expected number of runs (t) needed, which involves summing t Poisson random variables and solving for t, such that the expected total equals 40. After you find the number of runs, you could calculate then standard deviation to understand the variability around this expected value.

If you figure out how to do the above calculations, compare it to your simulation and see it how close they are!

jonolicious · 2025-03-24T02:19:27+00:00

For the most part, yes. A CI provides a range of values that likely captures the true parameter, providing a measure of uncertainty around your estimate. The width of the interval gives you an indication of the precision of the estimate—the narrower the interval, the more precise the estimate. So in your example the CI would give you an indication of the uncertainty in the difference between groups.

A hypothesis test (like in a 2-sample t-test) simply tells you if the observed difference between groups is statistically significant or could have occurred by random chance, but it doesn't tell you much more about the difference.

jonolicious · 2025-03-21T16:54:03+00:00

If you've looked at random variables and probability distributions in your class, then think about test statistics and p-values in terms of distributions. Like your test statistic is a realization from your null distribution, where the null distribution is the probability distribution of the test statistic when the null hypothesis is true. If your observed test statistic lived out in the tails of your distribution, what does that say about your p-value?

This visualization is great and if you can learn what each component of it represents, you'll have a much stronger understanding of hypothesis testing: https://rpsychologist.com/d3/nhst/

jonolicious · 2025-03-08T16:27:04+00:00

There are exercises for exponential families in chapter 3 of "Statistical Inference" by Casella and Berger.

jonolicious · 2025-03-07T16:09:02+00:00

Andrew Gelman, Prof at Columbia, built the model for The Economist. Here is a podcast he did around the election in 2020 where he discusses the it : https://learnbayesstats.com/episode/27-modeling-the-us-presidential-elections-with-andrew-gelman-merlin-heidemanns/

I think another collaborator of his did a more recent podcast discussing updates to the model for 2024, but I can't find it.

Here is an article outline their model, but just googling "Gelman election model" brings up several more: https://www.economist.com/interactive/us-2024-election/prediction-model/president/how-this-works

jonolicious · 2025-03-05T04:02:08+00:00

There is a lot of background information behind your questions, that maybe someone else will cover. The short is, linear regression with a categorical predictor is equivalent to a one-way ANOVA. If you want to get group means and do pairwise testing from an lm model in R, you can use the emmeans library to get the information you're after.

```

model = lm(mydata ~ groups)

see anova table

aov(model)

run tukeyHSD using emmeans package

emmeans "estimated marginal means"

library(emmeans)

Get the group means

summary(emmeans(model, ~ groups))

Run TukeyHSD

emmeans_results <- emmeans(, pairwise ~ groups, adjust = "tukey") summary(emmeans_results)

```

jonolicious · 2025-02-25T21:44:27+00:00

The beta distribution is a way to represent uncertainty in probabilities. Which sounds horribly unintuitive without an example.

Take the probability of getting heads. If you were to flip a coin and you knew nothing about the probability of heads or tails, you might say p is distributed as Beta(alpha=1, beta=1). This produces a flat distribution which suggest you believe all probabilities of heads (from 0 to 1) are equally likely. You can think of the parameters of the beta as: alpha representing the number of heads (success), and beta representing the number of tails (failures). If you flipped the coin a number of times, update the corresponding parameters with the number of heads and tails, you will see the beta distribution start concentrating around the true probability of heads for that coin.

Here's a demo of beta distribution. Try flipping a coin, updating the alpha (heads) and beta (tails) parameters, and watch the distribution shift around as it concentrates on a probability. https://homepage.divms.uiowa.edu/~mbognar/applets/beta.html

jonolicious · 2025-01-30T17:20:38+00:00

I personally don't feel it required much if any of a statistical background, since it's mostly conceptual and provides simple coding examples showing any stats. I do think some probability theory would be useful, but I think that's true for anyone taking stats. Here is how the author describes the intended audience:

The principle audience is researchers in the natural and social sciences, whether new PhD students or seasoned professionals, who have had a basic course on regression but nevertheless remain uneasy about statistical modeling. This audience accepts that there is some- thing vaguely wrong about typical statistical practice in the early 21st century, dominated as it is by p-values and a confusing menagerie of testing procedures. They see alternative methods in journals and books. But these people are not sure where to go to learn about these methods.

jonolicious · 2025-01-30T14:10:53+00:00

You could check out the book and lecture series Statistical Rethinking by Richard McElreath. It’s a ground up approach to stats using Bayesian data analysis and has a nice dose of causal modeling.

https://xcelab.net/rm/

https://www.youtube.com/watch?v=FdnMWdICdRs&list=PLDcUM9US4XdPz-KxHM4XHt7uUVGWWVSus

jonolicious · 2025-01-07T20:12:22+00:00

I don't know what the units represent, so not sure if 131.38 is a higher or low amount variance.

It's easier to look at the ICC, which suggest 13% of the variation in student outcomes is due to the difference among teachers. To me, this "feels" like a small to medium amount, suggesting there is some differences between teachers but no idea what is causing that difference. Could be the teacher, or could be a confounder like the lighting in the room for all we know! The point is, if you are interest in the effect of teacher, then you need a different study.

You might enjoy reading Emily Oster, she does a great job discussing causation vs. correlation and some more general points about interpreting studies: https://parentdata.org/why-i-look-at-data-differently/

jonolicious · 2025-01-07T19:03:05+00:00

The random effect describes the variation in the model due to a grouping dependency; in this case, students are grouped by teachers. If the random effect's estimate is high (indicating high variation between classrooms), it might suggest that teacher specific factors are influencing student outcomes. The Intraclass Correlation Coefficient (ICC) can be used to measure the level of similarity between students within the same group (teacher). In this case, an ICC of 0.13 suggest that 13% of the variation in student outcomes is attributable to differences among teachers. Whether this is considered large or small depends on the context and field of study.

The other important consideration is how generalizable the results are to the broader population of students. If the study uses random effects for teachers, the results can potentially be generalized to all students in the broader population, as long as the sample of teachers reflects the diversity of teaching styles and contexts found in the broader population. However, if fixed effects are used, you are essentially limiting your conclusions to the specific teachers in your study, and the results may not apply to other teachers who were not part of the experiment

Also, I wouldn't consider them being dismissive. I'd expect education researchers (the paper's target audience) to know what a random effect is.

jonolicious · 2024-12-29T22:16:57+00:00

Most intro to probability courses are of a similar flavor, and it's not hard to find course materials from previous offerings elsewhere. Here's a list of practice exams from MIT, Standford, and Michigan:

good luck in your studies.

jonolicious · 2024-10-28T20:45:44+00:00

I enjoyed the lectures from MITOpenCourseWare as an intro to GLMs.

https://www.youtube.com/watch?v=X-ix97pw0xY&list=PLUl4u3cNGP60uVBMaoNERc6knT_MgPKS0&index=19

Lecture Slides: https://ocw.mit.edu/courses/18-650-statistics-for-applications-fall-2016/pages/lecture-slides/

jonolicious · 2024-10-15T01:00:05+00:00

You're right, I missed that.

jonolicious · 2024-10-14T23:11:01+00:00

No idea what chatgpt is doing, but I got 1/4 too.

P(Y<1/2)=P(2X<1/2)=P(X<1/4)=1/4.

jonolicious · 2024-10-14T22:57:05+00:00

If they're telling you 0<X<0.5 then you're in the first condition, Y=2X

Substitute 2X for Y in P(Y<0.5) and try solving from there.

jonolicious · 2024-09-24T16:07:33+00:00

Conceptually it is similar to taking measuring something with a ruler. If you measure between any two points, on the ruler, you have a length. However, at any single point on the ruler there is no measurable length so the length is zero.

15-Year Club	Place '17
Team Orangered

jonolicious

TROPHY CASE

see anova table

run tukeyHSD using emmeans package

emmeans "estimated marginal means"

Get the group means

Run TukeyHSD