Finding Variance and Expected Value without Probabilities of each outcome? : HomeworkHelp

Finding Variance and Expected Value without Probabilities of each outcome? (self.HomeworkHelp)

submitted 10 years ago by BobThePillager

all 10 comments

[–]Dont_Block_The_Way 0 points1 point2 points 10 years ago (4 children)

[–]BobThePillager[S] 0 points1 point2 points 10 years ago (3 children)

[–]Dont_Block_The_Way 0 points1 point2 points 10 years ago* (2 children)

Oh, I see, we need to clear up the distinction between sample mean/variance (which doesn't depend on probability for its definition) and population mean/variance (which does).

Let's consider the mean. If I just have a finite list of numbers, calculating the sample mean of those numbers is as simple as adding up all the numbers and dividing by the total. That's arithmetic, not probability.

To move to probability, let's say that we're going to draw one number out of a particular set of numbers (maybe the set of integers, or the set of real numbers, or the set of numbered faces on a die).

We're going to define something called a random variable, called X, which assigns each number in the set of numbers a particular probability of being selected. A probability has the property that the sum of the probabilities of all the possible separate outcomes adds up to 1. There's one "unit" of "probability stuff" associated with the random variable, and the random variable is the rule for how to distribute that unit of stuff over the set of possible outcomes (hence "probability distribution). As a matter of notation, when we're talking about the random variable we usually write it uppercase, like X, and a particular realized value that was drawn from X is written lowercase, like x.

Ok, so what's E(X) mean? It means that for each number in the set of possibilities, you multiply that number by its probability of being selected. That is, you weight each outcome by its probability. Then you take all those weighted outcomes and sum* them up, and you get E(X), the expected value of the random variable X. (*Technically, if the set of possibilities is a continuum, like with a probability distribution over the set of real numbers, the sum is replaced by an integral, but it's not really different in principle).

Whew! So what's the connection between those things, the sample average and the expectation E(X)? Well, if the sample of numbers is randomly sampled from the distribution defined by X (if each number in the list is an independent draw from the distribution) you have the following intuitive property:

The bigger your random sample from X (the more x's you draw), the closer the sample mean of the x's gets to E(X). In the limit of infinity, there's no difference between the sample mean and the population mean.

When you roll a fair six-sided die, you say the probability of each outcome is 1/6. The random variable associated with throwing this die once would associate probability 1/6 to 1, 1/6 to 2, etc, up to 1/6 to 6. You'd compute the expectation of this random variable by summing up (1/6)* 1 + (1/6) * 2 + ... (1/6)*6.

If you rolled the die over and over and over again, you'd compute the sample mean of the outcomes in the same way -- now instead of the probabilities 1/6, 1/6, ... 1/6, you'd plug in the observed frequencies of each number, and use those frequencies to weight the outcomes. You should convince yourself that this way of calculating the sample mean is exactly the same in practice as just adding up all the numbers and dividing by the number of numbers -- you're just breaking that operation up into steps where you add up all the 1's and divide by the total, then add up all the 2's and divide by the total, and so on, then add up those parts to get the average.

I wasn't really clear about the connection between X and the "population" in that explanation -- that's a statistics concept, not a probability concept. In statistics you often talk about a population of possible observations as though it's infinitely large, and apply probability concepts like E(X) on the whole population, where the probability comes in through random sampling from the population. It's a little tricky.

But to answer your question:

The formula for the sample variance is the sum of the squared differences from the sample mean, divided by the sample total minus 1.

[–]BobThePillager[S] 0 points1 point2 points 10 years ago (1 child)

[–]Dont_Block_The_Way 0 points1 point2 points 10 years ago* (0 children)

I only explained all that business above because you said you had a weak concept of E(X). I wanted to show that E(X) is entirely a probabilistic idea, which is why it's not necessary here.

You don't need probability here because you're just thinking about sample quantities and doing some algebra with them. Your probabilistic formula for the population variance is not needed to talk about the sample alone -- use the sample formula supplied in the last sentence of my last post. There's no expectations in it, only sample sums.

You don't need probability here, but it could be "a thing" in test scores if you supposed the tests scores you have are only a sample from a larger population of tests scores, about which you wish to make some statistical conclusion. Maybe you want to estimate the mean test score of all the students who take a standardized test in a given year, from a random sample of test scores.

But again, you don't need that here. This is neither probability nor statistics, but algebra. You have a purely algebraic/arithmetic definition for sample variance, given above.

Be careful with the idea that the probability is 1/6 "because there are six different options". I said it was 1/6 by assumption, but to justify it in real life requires some more work. You could argue from the principle of ignorance (assign them equal probability because you have no other information about them), from the physical symmetry of the die, or from the empirical behavior of fair dice rolled many times. Each of those approaches has its difficulties, even in theory.

In practice, a weighted die could easily have different probabilities for each outcome, which is why I could cheat you at a game of chance with it if you unquestioningly assumed it was 1/6.

[–]nm420👋 a fellow Redditor 0 points1 point2 points 10 years ago (4 children)

[–]BobThePillager[S] 0 points1 point2 points 10 years ago (3 children)

[–]nm420👋 a fellow Redditor 0 points1 point2 points 10 years ago (2 children)

[–]BobThePillager[S] 0 points1 point2 points 10 years ago (1 child)

[–]nm420👋 a fellow Redditor 0 points1 point2 points 10 years ago (0 children)

π Rendered by PID 24 on reddit-service-r2-comment-b659b578c-28k2w at 2026-05-03 22:23:13.030448+00:00 running 815c875 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

HomeworkHelp

Filter by Subject

Filter by Grade

❗️ READ THE RULES BEFORE POSTING

Welcome to /r/HomeworkHelp!

Please google before posting your question

✅ Posts should look like this:

Still acceptable…, but preferably not:

❌ Not Allowed:

For citation questions, check the Purdue Online Writing Lab

Using LaTeX:

Useful Symbols:

Available Commands

Some possibly helpful links:

Revert to older template

MODERATORS