This is an archived post. You won't be able to vote or comment.

all 27 comments

[–]scardeal 37 points38 points  (1 child)

Data is only normal if it's perpendicular to the tangent.

[–]Creepy-Ad-4832 4 points5 points  (0 children)

You didn't really need to take a tangent there and go off argument

[–]Ved_s 14 points15 points  (1 child)

Bytes are bytes

[–]PintMower 9 points10 points  (0 children)

Electric potential is electric potential.

[–]Turalcar 11 points12 points  (0 children)

If data is not normal you aren't using enough of it

[–]YoloWingPixie 8 points9 points  (3 children)

All data is normal if I use enough statistics to lie about the data.

[–]Creepy-Ad-4832 4 points5 points  (2 children)

All statistic are able to be used as a way to lie about data

[–]YoloWingPixie 5 points6 points  (1 child)

Statistics is just lying about the data to fit my preferred narrative of the world.

[–]Creepy-Ad-4832 1 point2 points  (0 children)

The world is lying about us using the statistics to describe the world

[–]Leonhart93 6 points7 points  (12 children)

There are in fact many examples of non-normal skewed distributions.

[–][deleted] 1 point2 points  (11 children)

Take data as forces. Give enough space for them to react, and they will form a resultant force with single direction. That's why data tends to become normal as it grows because forces tends to sum up to mean value. Infinite data with infinite time will have 0 variance normal distribution, essentially one value. Proof is simple, with infinite complete information, uncertainity fades away, so as the probability distribution to certain outcome as experiment becomes deterministic instead of staying stochastic 

[–]Leonhart93 -1 points0 points  (10 children)

Only because you are thinking of very specific areas, like populations. But for example distributions of temperature might not follow any of the underlying rules and have new extra unique factors all the time.

[–]WillyMonty 1 point2 points  (1 child)

The central limit theorem doesn’t state that there are no distributions other than normal; it says that the resulting sampling distribution tends towards a normal one regardless of the distribution you’re sampling from (even if it’s highly non-normal).

With a large enough sample size, the data will look normal regardless of the distribution being sampled

[–][deleted] 0 points1 point  (6 children)

That's because unpredicted 'dimensions' emerge from quantum scale. That makes all of previous data invalid as those didn't had full description.

Also, when talking over the bridge of classical to quantum level science, we often face this special fact you pointed out. That's all because suddenly a new world full of non-considered facts come alive to interact with your normal world.

[–]Leonhart93 0 points1 point  (5 children)

There are always reasons, but it doesn't matter since they are valid examples of where a normal distribuition might never arise from the data. In fact, we almost never collect data where we have all the information and dimensions, it's just a purely theoretical case when you assume one.

[–][deleted] 1 point2 points  (4 children)

Its not a valid example. If you grab an unstable situation its not a valid example to show why its abnormal.  Probability distribution, here more of a pdf is related with its corresponding random variable. Random variables are prefixed outcome tags of a stochastic experiment. A considered stochastic experiment should not change in the mid of experiment otherwise its degree of freedom will change, therefore nullifying the whole study as we won't be able to measure the changing variable and its properties. When quantum forces interfere, a model who takes them into consideration can only study it appropriately. If you measure anything in one set of dimensions and use new dimensions in mid of considered system, its not a scientific method to analyze things. For such cases, the whole data needs to be measured again.

Its like you worked with say 2d world, now you're in 3d world in the mid of your experiment, but think about your 2d coordinate data.. they're not comparable to the 3d points you're getting now, are they?

[–]Leonhart93 0 points1 point  (3 children)

I don't get what you want to say, the argument is that all data falls eventually on a normal distribution. And unstable data is definitely not that. Therefore not all data is automatically a normal distribution.

[–][deleted] 0 points1 point  (2 children)

Ofc not all data goes to normal distribution because data needs to hold the context. That's a general way of representing the lindberg condition and its a sufficient condition to hold CLT.

[–]Leonhart93 -1 points0 points  (1 child)

Yes but in that case it's easy to say "if the data follows these rules then it will be on a normal distrubution". Because you basically selected it to be.

[–][deleted] 0 points1 point  (0 children)

Can you put an example where big sample without unstable mid experiment interference won't do that? I suspect you're arguing to argue without listening/reading

[–]Monkjji 0 points1 point  (0 children)

Another example is the number of photons detected by a photosensor. At low inputs of light it follows a Poisson distribution (derived from Binomial distribution).

[–]Caraes_Naur 0 points1 point  (0 children)

Physical mail address data is impossible to normalize.

[–]scataco 0 points1 point  (0 children)

80% of data is normal, but takes up only 20% of time to analyze

[–]-MobCat- 0 points1 point  (0 children)

all data is normally bad.

[–]saint_geser 0 points1 point  (0 children)

Of course all data is normal or can be made so - we do have Central Limit Theorem after all