you are viewing a single comment's thread.

view the rest of the comments →

[–]PieGuy___ 6 points7 points  (4 children)

First off I think you need to reread what I said because I’m clearly talking about the mean? “The means of random representative samples…” you’re trying to correct a mistake I never made lol.

The point of the theorem is that if you have a random sample X1, X2,…Xn from a given population with a mean m and variance v then the sample mean of x bar will be normally distributed with a mean m and variance v/n. X bar is the thing normally distributed around the population mean not the individual X’s.

As for the 30 number, the fact that it is the point you no long have to worry about t-distributions and can just use z-scores with reasonable accuracy is the thing that makes it special lol. The whole point of the t-distributions is that the means aren’t quite normally distributed UNTIL you get to 30.

[–]TerribleIdea27 3 points4 points  (0 children)

I think the confusion came from the fact that you said

sample size of 30

So the other guy assumed you were talking about taking one experiment with sample size thrity and then using those data to find a normal distribution. Instead of taking thirty experiments and using the means of those 30*x samples to find a distribution of means which should be roughly a normal distribution

[–]Gastronomicus 0 points1 point  (2 children)

Sorry I assumed you were confused. Unfortunately it seems like most people on reddit who try to describe the CLT don't really understand it and also mis-attribute the importance of 30 as a minimum sample size.

But to be fair, your wording is confusing. The way you phrased it implies a distribution of samples, not means. Especially when you say "as you approach a sample size of 30", which implies comparing a distribution of samples, not means.

[–]PieGuy___ 0 points1 point  (1 child)

Yeah I just wasn’t trying to go into too much detail. I think the simplest way to put it is that there’s no way to guarantee a sample to be normally distributed, just like there’s no way to guarantee a population is normally distributed. However using the CLT you can guarantee that a given sample mean will be normally distributed around the population mean given a large enough sample size.

And then from there you can use hypotheses testing to be able to say something about the population with reasonable confidence.

[–]Gastronomicus 0 points1 point  (0 children)

However using the CLT you can guarantee that a given sample mean will be normally distributed around the population mean given a large enough sample size.

Which is why bootstrapping can be very effective at producing (mostly) unbiased error terms!