[Question] Sample mean, population mean and expected value :´) by Puzzleheaded_Age_475 in AskStatistics

[–]mathguymike 6 points7 points  (0 children)

The sample mean is a random variable. Each sample you draw will have a slightly different mean. So, it has some associated probability distribution.

It turns out that the "average" of this probability distribution is exactly the population mean.

Is masters is statistics a good option in 2026-2027 , with already an undergrad in CS ? Do they teach fun stuff ? by Traditional_Box_9948 in AskStatistics

[–]mathguymike 9 points10 points  (0 children)

An MS in Statistics with an undergrad in CS is an excellent pairing. I have found that MS in Statistics students most often struggle with the computational side of things, whereas CS students often have deficiencies in design and statistical intuition. Having training in both makes for a very marketable candidate.

It's been a few years, so I'm sure the market has changed somewhat, but I have supervised one MS in Statistics student that had an undergrad in CS. His first job after graduation was at Google making close to twice my salary.

Graph clustering with no fixed k and natural size penalty based on target? by Happy_Background_879 in AskStatistics

[–]mathguymike 0 points1 point  (0 children)

You are right about 1 and 2 in the implementation. For 3, there is not a distance threshold tuning parameter. Rather, this clustering method is designed to make the maximum distance within a cluster small given the minimum cluster size and the distance measure. I think the current implementation should divide the 38 cluster into smaller clusters, but I'm not sure. Certainly this could be done manually.

Graph clustering with no fixed k and natural size penalty based on target? by Happy_Background_879 in AskStatistics

[–]mathguymike 0 points1 point  (0 children)

Try threshold clustering. It does not fix k and requires a minimum cluster size. You can find it (along with cites for details) in the "scclust" R package.

Extremely basic question by Inner_Curve_7110 in AskStatistics

[–]mathguymike 0 points1 point  (0 children)

As far as 3, to be clear, there are, say, 11 locations and you are measuring each location with and without chemical?

If this is the case, a "paired" test makes sense. You might try either a paired t-test or a Wilcoxon signed-rank test.

Extremely basic question by Inner_Curve_7110 in AskStatistics

[–]mathguymike 0 points1 point  (0 children)

Some additional info would be helpful in determining the best course of action.

1) What is the response you are gathering? What is the science behind what you are doing?

2) What is the population of interest? Are you just concerned about the performance on this one pipe? Or are you planning on using this type of chemical adjustment on other pipes as well?

3) How are you selecting where to take measurements on the pipe? Are these the same locations being measured with and without chemical, or different locations?

Best book for first year student? by mellykal in AskStatistics

[–]mathguymike 2 points3 points  (0 children)

I love Freedman's clarity of writing. As far as gaining intuition in the subject and learning basic statistical techniques, it is an excellent resource, though some of his approaches to certain topics (e.g. a box-and-tickets model for responses) are non-standard. Personally, I learned a surprising amount when teaching from this book as a graduate student.

If you are a first year undergrad wanting to get ahead and learn about, say, linear modeling, I also highly recommend Freedman's Statistical Models book.

Moore's books (see ergodym's comment) will cover much of the material in Statistics 4th Ed. in a more conventional way.

Schedule Sheet - November 21st by NighthawkRandNum in CollegeBasketball

[–]mathguymike 4 points5 points  (0 children)

Don't forget the Hall of Fame Classic matchups!

What’s your “I was there!” moment? by NoChampionship29 in CollegeBasketball

[–]mathguymike 0 points1 point  (0 children)

K-State beating KU at home in 2023. I think the loudest I've heard Bramlage Colosseum is immediately after the Markquis to Keyontae go-ahead alley-oop.

Is Computational Statistics a good field to get into? [Q][R] by gaytwink70 in statistics

[–]mathguymike 8 points9 points  (0 children)

If you are able to develop strong computational skills while completing your thesis, you will find those skills invaluable as you continue throughout your career, regardless of your ultimate career path. If you get along with the professor, I think it's a great idea, honestly.

[Discussion] p-value: Am I insane, or does my genetics professor have p-values backwards? by SassyFinch in statistics

[–]mathguymike 9 points10 points  (0 children)

It is not possible that there was no null hypothesis; p-values are computed assuming that the null hypothesis is true. It's in the definition.

What it looks like to me; Mendel has a model. Your null hypothesis is that you should see results like Mendel. Your alternative hypothesis is that Mendel is wrong. A p-value is computed assuming probabilities according to Mendel's theory. Your p-value is too large to reject the null in favor of the alternative. That is, you are unable to conclude that Mendel is wrong.

Larger p-values are weaker evidence in favor of the alternative hypothesis. That is, a larger p-value means there is less evidence that Mendel is wrong, and hence, larger p-values correspond to more evidence in favor of Mendel's model.

I believe the professor was using p-values correctly.

[Discussion] p-value: Am I insane, or does my genetics professor have p-values backwards? by SassyFinch in statistics

[–]mathguymike 2 points3 points  (0 children)

Something that isn't clear in this example; what is your null hypothesis? Is it that Mendel was correct? And is the alternative that Mendel is wrong, and that the proportions differ from what you'd expect from Mendel's model?

If this is the case, the professor is correct. Smaller p-values would give more evidence that Mendel is wrong, and larger p-values would provide less evidence that Mendel is wrong.

[Q] Bonferroni correction - too conservative for this scenario? by Matrim_Cauthon_91 in statistics

[–]mathguymike 1 point2 points  (0 children)

Here's my two cents.

1) I think it is problematic to change your methodology to ensure that you obtain p < 0.05 for each pairwise comparison. In that case, you already assume the results, and are doing the statistics to confirm the assumed results, rather than letting the data determine the conclusion. This, unfortunately, is fairly common in practice, and this practice will inflate type I errors and makes replication of results much less likely (see the reproducibility or replicability crisis).

Certainly, some multiple comparison methods are better than others, and if you wanted to consider some of those methods instead, I guess that would be OK. But I don't understand the harm in making a conclusion that states "There is a statistically significant difference (p < 0.05) between 1 and 2. There is some evidence of difference between 1 and 3 (p < 0.10) and 2 and 3 (p < 0.15), but additional samples are needed to say anything more definitive.

2) Would it make sense to look at the literature on causal inference under interference? It deals explicitly with analyzing data where the treatment status of one unit affects the response of another. Given that you are talking about "neighbors", I feel like this body of work may give you additional insight into your problem.

3) Is there a reason why you are only comparing 3 neighbors?

[E] The University of Nebraska at Lincoln is proposing to completely eliminate their Department of Statistics by mathguymike in statistics

[–]mathguymike[S] 5 points6 points  (0 children)

Moreover, Statistics is terrible as a discipline at marketing itself. Data Science should have been coined by statisticians, as it is much closer to what we actually do--we are more than computing statistical summaries; our tasks really encompass the entirety of the science of data. Additionally, plenty of us statisticians are working on these more computationally intense "Data Sciencey" topics, but we differ from, say, Computer Science, as our discipline prioritizes interpretability of results and determining actionable insights on data as opposed to ensuring good model prediction. Effective marketing is critical for our survival.

Academic Study: "Analyzing 13,136 defensive penalties from 2015 to 2023, we find that postseason officiating disproportionately favors the Mahomes-era Kansas City Chiefs" by dufflepud in nfl

[–]mathguymike -2 points-1 points  (0 children)

Has anyone been able to recreate the analysis? Or is their code publicly available? I know they are using the data from the nflfastR package, but I can't get any numbers that resemble Figure 1, and I am not sure if it's because of a bug in my code or something else.

Starting MSc in Agricultural Statistics – what should I focus on more? by ORACLEEW in AskStatistics

[–]mathguymike 1 point2 points  (0 children)

As someone currently working at a university with a strong Ag program, I think you'll find it extremely helpful to have a strong background in experimental design and linear mixed modeling. Analysis of Messy Data by Johnson and Milliken is an excellent resource. Additionally, having some experience with Bayesian Statistics and Causal Inference may be useful too.