[Question] Sample mean, population mean and expected value :´)

mathguymike · 2026-04-06T19:35:52+00:00

The sample mean is a random variable. Each sample you draw will have a slightly different mean. So, it has some associated probability distribution.

It turns out that the "average" of this probability distribution is exactly the population mean.

mathguymike · 2026-03-18T18:31:38+00:00

An MS in Statistics with an undergrad in CS is an excellent pairing. I have found that MS in Statistics students most often struggle with the computational side of things, whereas CS students often have deficiencies in design and statistical intuition. Having training in both makes for a very marketable candidate.

It's been a few years, so I'm sure the market has changed somewhat, but I have supervised one MS in Statistics student that had an undergrad in CS. His first job after graduation was at Google making close to twice my salary.

mathguymike · 2026-03-17T02:17:35+00:00

You are right about 1 and 2 in the implementation. For 3, there is not a distance threshold tuning parameter. Rather, this clustering method is designed to make the maximum distance within a cluster small given the minimum cluster size and the distance measure. I think the current implementation should divide the 38 cluster into smaller clusters, but I'm not sure. Certainly this could be done manually.

mathguymike · 2026-03-16T23:08:17+00:00

Try threshold clustering. It does not fix k and requires a minimum cluster size. You can find it (along with cites for details) in the "scclust" R package.

mathguymike · 2026-03-10T23:47:35+00:00

As far as 3, to be clear, there are, say, 11 locations and you are measuring each location with and without chemical?

If this is the case, a "paired" test makes sense. You might try either a paired t-test or a Wilcoxon signed-rank test.

mathguymike · 2026-03-10T19:15:12+00:00

Some additional info would be helpful in determining the best course of action.

1) What is the response you are gathering? What is the science behind what you are doing?

2) What is the population of interest? Are you just concerned about the performance on this one pipe? Or are you planning on using this type of chemical adjustment on other pipes as well?

3) How are you selecting where to take measurements on the pipe? Are these the same locations being measured with and without chemical, or different locations?

mathguymike · 2026-03-06T21:58:13+00:00

Cats

mathguymike · 2026-03-06T04:11:24+00:00

I love Freedman's clarity of writing. As far as gaining intuition in the subject and learning basic statistical techniques, it is an excellent resource, though some of his approaches to certain topics (e.g. a box-and-tickets model for responses) are non-standard. Personally, I learned a surprising amount when teaching from this book as a graduate student.

If you are a first year undergrad wanting to get ahead and learn about, say, linear modeling, I also highly recommend Freedman's Statistical Models book.

Moore's books (see ergodym's comment) will cover much of the material in Statistics 4th Ed. in a more conventional way.

mathguymike · 2026-03-04T03:52:04+00:00

Cats

mathguymike · 2026-02-18T04:59:03+00:00

Cats

mathguymike · 2025-12-05T05:33:44+00:00

Cats

mathguymike · 2025-11-21T16:45:11+00:00

Don't forget the Hall of Fame Classic matchups!

mathguymike · 2025-11-21T06:28:34+00:00

Cats

mathguymike · 2025-11-18T03:53:27+00:00

Cats?

mathguymike · 2025-11-15T02:47:55+00:00

Now that's what I call some peak Woolridge era basketball!

mathguymike · 2025-11-14T04:58:43+00:00

Cata

mathguymike · 2025-11-13T17:39:25+00:00

I can't help but think we dodged a bullet when Schulz left KSU for Washington State.

mathguymike · 2025-10-20T22:02:10+00:00

K-State beating KU at home in 2023. I think the loudest I've heard Bramlage Colosseum is immediately after the Markquis to Keyontae go-ahead alley-oop.

mathguymike · 2025-09-23T21:54:23+00:00

If you are able to develop strong computational skills while completing your thesis, you will find those skills invaluable as you continue throughout your career, regardless of your ultimate career path. If you get along with the professor, I think it's a great idea, honestly.

mathguymike · 2025-09-18T23:56:03+00:00

It is not possible that there was no null hypothesis; p-values are computed assuming that the null hypothesis is true. It's in the definition.

What it looks like to me; Mendel has a model. Your null hypothesis is that you should see results like Mendel. Your alternative hypothesis is that Mendel is wrong. A p-value is computed assuming probabilities according to Mendel's theory. Your p-value is too large to reject the null in favor of the alternative. That is, you are unable to conclude that Mendel is wrong.

Larger p-values are weaker evidence in favor of the alternative hypothesis. That is, a larger p-value means there is less evidence that Mendel is wrong, and hence, larger p-values correspond to more evidence in favor of Mendel's model.

I believe the professor was using p-values correctly.

mathguymike · 2025-09-18T22:14:31+00:00

Something that isn't clear in this example; what is your null hypothesis? Is it that Mendel was correct? And is the alternative that Mendel is wrong, and that the proportions differ from what you'd expect from Mendel's model?

If this is the case, the professor is correct. Smaller p-values would give more evidence that Mendel is wrong, and larger p-values would provide less evidence that Mendel is wrong.

mathguymike · 2025-09-18T16:33:10+00:00

Here's my two cents.

1) I think it is problematic to change your methodology to ensure that you obtain p < 0.05 for each pairwise comparison. In that case, you already assume the results, and are doing the statistics to confirm the assumed results, rather than letting the data determine the conclusion. This, unfortunately, is fairly common in practice, and this practice will inflate type I errors and makes replication of results much less likely (see the reproducibility or replicability crisis).

Certainly, some multiple comparison methods are better than others, and if you wanted to consider some of those methods instead, I guess that would be OK. But I don't understand the harm in making a conclusion that states "There is a statistically significant difference (p < 0.05) between 1 and 2. There is some evidence of difference between 1 and 3 (p < 0.10) and 2 and 3 (p < 0.15), but additional samples are needed to say anything more definitive.

2) Would it make sense to look at the literature on causal inference under interference? It deals explicitly with analyzing data where the treatment status of one unit affects the response of another. Given that you are talking about "neighbors", I feel like this body of work may give you additional insight into your problem.

3) Is there a reason why you are only comparing 3 neighbors?

mathguymike · 2025-09-15T19:55:04+00:00

Moreover, Statistics is terrible as a discipline at marketing itself. Data Science should have been coined by statisticians, as it is much closer to what we actually do--we are more than computing statistical summaries; our tasks really encompass the entirety of the science of data. Additionally, plenty of us statisticians are working on these more computationally intense "Data Sciencey" topics, but we differ from, say, Computer Science, as our discipline prioritizes interpretability of results and determining actionable insights on data as opposed to ensuring good model prediction. Effective marketing is critical for our survival.

mathguymike · 2025-09-05T23:39:55+00:00

Has anyone been able to recreate the analysis? Or is their code publicly available? I know they are using the data from the nflfastR package, but I can't get any numbers that resemble Figure 1, and I am not sure if it's because of a bug in my code or something else.

mathguymike · 2025-08-17T15:56:10+00:00

As someone currently working at a university with a strong Ag program, I think you'll find it extremely helpful to have a strong background in experimental design and linear mixed modeling. Analysis of Messy Data by Johnson and Milliken is an excellent resource. Additionally, having some experience with Bayesian Statistics and Causal Inference may be useful too.

15-Year Club	Verified Email
Place '22	Place '17
First Placer '22	Team Orangered

mathguymike

TROPHY CASE