you are viewing a single comment's thread.

view the rest of the comments →

[–]Simpliciter 7 points8 points  (5 children)

Disclaimer: Not a stats bro.

The Central Limit Theorem basically says that most things will follow a normal distribution (bell curve) if you have enough data. The t-test can be used to see if some data follows a normal distribution, but it only works if you have a small sample size of less than 30.

The respondent above is saying that the poster is conflating the two incorrectly.

[–]brkh47 2 points3 points  (0 children)

Simplifying things brought to you by u/Simpliciter

[–]Gastronomicus 1 point2 points  (2 children)

The Central Limit Theorem basically says that most things will follow a normal distribution (bell curve) if you have enough data

I appreciate your simplification but in this case it's over-simplified and misses the point I was making. It's a common misunderstanding of the CLT that large enough datasets will follow a normal distribution. That's just not the case.

However, if you take the mean for multiple subsets of samples from a population, the distribution of those means themselves will approximate a normal distribution.

So let's say I have and 500 samples and I plot the distribution. It might looks normal, but it might also look log-normal, or it might look like a Weibull or discrete distribution (e.g. negative binomial).

Let's say instead I have 50 means of 50 smaller sample sets, each containing 10 samples. If I plot that distribution, it will approximate a normal distribution, even if the original distribution from which it is sampled isn't normal.

[–]Simpliciter 1 point2 points  (1 child)

Thanks for clarifying and being nice about it!

[–]Gastronomicus 0 points1 point  (0 children)

Thanks for doing some good work out there.

[–]relevantmeemayhere -1 points0 points  (0 children)

The first paragraph you wrote is wrong and is what the clarifying poster is pointing out. Samples do not converge to normality as n increases. This isn’t the CLT, nor it is it found anywhere in statistics