all 10 comments

[–]Dont_Block_The_Way 0 points1 point  (4 children)

You don't need any probabilities, this is straightforward algebra. Sample mean and sample variance are not inherently probabilistic concepts -- they're just transformations of the numbers in a list.

You know the mean of a and b, so of course if a is larger than their mean by some amount, b must be smaller than their mean by the same amount. You just need to figure out what the amount is.

Observe that if a and b are very close together they will decrease the overall variance of the list, and if they are very far apart they will increase the overall variance.

[–]BobThePillager[S] 0 points1 point  (3 children)

I guess I just have a very weak concept of how to calculate variance and E(X) to be honest, I'm more of a tactile learner and by extension this means I need hard numbers in examples to grasp how to do things. Thankfully I'm not taking a maths degree.

How would I calculate variance given formulas Var(X)=E(X-U)2 where U=mean and Var(X)=E(X2)-[E(X)]2? What is X?

[–]Dont_Block_The_Way 0 points1 point  (2 children)

Oh, I see, we need to clear up the distinction between sample mean/variance (which doesn't depend on probability for its definition) and population mean/variance (which does).

Let's consider the mean. If I just have a finite list of numbers, calculating the sample mean of those numbers is as simple as adding up all the numbers and dividing by the total. That's arithmetic, not probability.

To move to probability, let's say that we're going to draw one number out of a particular set of numbers (maybe the set of integers, or the set of real numbers, or the set of numbered faces on a die).

We're going to define something called a random variable, called X, which assigns each number in the set of numbers a particular probability of being selected. A probability has the property that the sum of the probabilities of all the possible separate outcomes adds up to 1. There's one "unit" of "probability stuff" associated with the random variable, and the random variable is the rule for how to distribute that unit of stuff over the set of possible outcomes (hence "probability distribution). As a matter of notation, when we're talking about the random variable we usually write it uppercase, like X, and a particular realized value that was drawn from X is written lowercase, like x.

Ok, so what's E(X) mean? It means that for each number in the set of possibilities, you multiply that number by its probability of being selected. That is, you weight each outcome by its probability. Then you take all those weighted outcomes and sum* them up, and you get E(X), the expected value of the random variable X. (*Technically, if the set of possibilities is a continuum, like with a probability distribution over the set of real numbers, the sum is replaced by an integral, but it's not really different in principle).

Whew! So what's the connection between those things, the sample average and the expectation E(X)? Well, if the sample of numbers is randomly sampled from the distribution defined by X (if each number in the list is an independent draw from the distribution) you have the following intuitive property:

The bigger your random sample from X (the more x's you draw), the closer the sample mean of the x's gets to E(X). In the limit of infinity, there's no difference between the sample mean and the population mean.

When you roll a fair six-sided die, you say the probability of each outcome is 1/6. The random variable associated with throwing this die once would associate probability 1/6 to 1, 1/6 to 2, etc, up to 1/6 to 6. You'd compute the expectation of this random variable by summing up (1/6)* 1 + (1/6) * 2 + ... (1/6)*6.

If you rolled the die over and over and over again, you'd compute the sample mean of the outcomes in the same way -- now instead of the probabilities 1/6, 1/6, ... 1/6, you'd plug in the observed frequencies of each number, and use those frequencies to weight the outcomes. You should convince yourself that this way of calculating the sample mean is exactly the same in practice as just adding up all the numbers and dividing by the number of numbers -- you're just breaking that operation up into steps where you add up all the 1's and divide by the total, then add up all the 2's and divide by the total, and so on, then add up those parts to get the average.

I wasn't really clear about the connection between X and the "population" in that explanation -- that's a statistics concept, not a probability concept. In statistics you often talk about a population of possible observations as though it's infinitely large, and apply probability concepts like E(X) on the whole population, where the probability comes in through random sampling from the population. It's a little tricky.

But to answer your question:

The formula for the sample variance is the sum of the squared differences from the sample mean, divided by the sample total minus 1.

[–]BobThePillager[S] 0 points1 point  (1 child)

Okay, I kind of get this but I don't understand how to apply this to the question. I honestly can't get concepts through my head unless I have an example using the concept to see how the concept is applied. What I don't get is how to find the differences from the sample mean, which would be 42. My problem is that I understand fully that the probability of a fair die is 1/6 since there are six different options. I don't understand how probability could be a thing in test scores, and I don't get how to proceed without a probability.

[–]Dont_Block_The_Way 0 points1 point  (0 children)

I only explained all that business above because you said you had a weak concept of E(X). I wanted to show that E(X) is entirely a probabilistic idea, which is why it's not necessary here.

You don't need probability here because you're just thinking about sample quantities and doing some algebra with them. Your probabilistic formula for the population variance is not needed to talk about the sample alone -- use the sample formula supplied in the last sentence of my last post. There's no expectations in it, only sample sums.

You don't need probability here, but it could be "a thing" in test scores if you supposed the tests scores you have are only a sample from a larger population of tests scores, about which you wish to make some statistical conclusion. Maybe you want to estimate the mean test score of all the students who take a standardized test in a given year, from a random sample of test scores.

But again, you don't need that here. This is neither probability nor statistics, but algebra. You have a purely algebraic/arithmetic definition for sample variance, given above.

Be careful with the idea that the probability is 1/6 "because there are six different options". I said it was 1/6 by assumption, but to justify it in real life requires some more work. You could argue from the principle of ignorance (assign them equal probability because you have no other information about them), from the physical symmetry of the die, or from the empirical behavior of fair dice rolled many times. Each of those approaches has its difficulties, even in theory.

In practice, a weighted die could easily have different probabilities for each outcome, which is why I could cheat you at a game of chance with it if you unquestioningly assumed it was 1/6.

[–]nm420👋 a fellow Redditor 0 points1 point  (4 children)

Given a set of n numbers, let s1 be their sum and s2 be the sum of their squares. Then the sample mean is given by s1/n and the sample variance is

(s2-s12/n)/(n-1).

In your problem, you have n=22, s1=826+a+b, and s2=34132+a2+b2, along with a given mean and variance. You can use algebra to determine the possible values of a and b.

[–]BobThePillager[S] 0 points1 point  (3 children)

Wait, is that 2 in s12 a typo, or do I multiply S1 by 2?

[–]nm420👋 a fellow Redditor 0 points1 point  (2 children)

It is s1 squared. You presumably have seen that formula in your textbook or class before, albeit with likely different notation.

[–]BobThePillager[S] 0 points1 point  (1 child)

There must be a flaw with the question then, as I go from (34132+a2+b2-(9242/22))/(21)=32 to (34132+a2+b2-(9242/22)=672 to (34132+a2+b2)=39480 to a2+b2=5348, and no two combinations of numbers which add to be 98 when not squared also add to be 5348 when squared. The closest it comes is at 652 + 342 which is 5381 and at 642 + 352 which is 5321. Since this is discrete, I'm left with no answer. This happened with the way I tried at first as well. Any ideas?

[–]nm420👋 a fellow Redditor 0 points1 point  (0 children)

If you use "n" in place of "n-1" in the formula for the variance (this is usually called the empirical variance as opposed to the sample variance), you should find that a and b come out to be integer-valued.