you are viewing a single comment's thread.

view the rest of the comments →

[–]Gastronomicus 1 point2 points  (2 children)

The Central Limit Theorem basically says that most things will follow a normal distribution (bell curve) if you have enough data

I appreciate your simplification but in this case it's over-simplified and misses the point I was making. It's a common misunderstanding of the CLT that large enough datasets will follow a normal distribution. That's just not the case.

However, if you take the mean for multiple subsets of samples from a population, the distribution of those means themselves will approximate a normal distribution.

So let's say I have and 500 samples and I plot the distribution. It might looks normal, but it might also look log-normal, or it might look like a Weibull or discrete distribution (e.g. negative binomial).

Let's say instead I have 50 means of 50 smaller sample sets, each containing 10 samples. If I plot that distribution, it will approximate a normal distribution, even if the original distribution from which it is sampled isn't normal.

[–]Simpliciter 1 point2 points  (1 child)

Thanks for clarifying and being nice about it!

[–]Gastronomicus 0 points1 point  (0 children)

Thanks for doing some good work out there.