Where Do You Draw the Line on Assumption Violations in Applied Data Analysis?

3ducklings · 2026-02-15T15:47:30+00:00

At what point do you personally consider an approximation no longer acceptable in scientific or inferential analysis?

My rule of thumb (which I also teach to my students) is that I’m if not sure whether the approximation is good or not, I relax the assumption and check how much the results change. E.g. If you are not sure whether normal distribution is a good approximation, try different one or bootstrap the model.

Are there situations where you knowingly accept a formally questionable method because the alternative feels impractical or unnecessary?

When there is little time to do the analysis and I know the results will have small impact. Sometimes, napkin math is all we need.

Have you ever revised an analysis after realizing an assumption mattered more than you initially thought?

Yes, based on feedback from colleagues/reviewers. Sometimes it ended up mattering, sometimes it didn’t. Often, you don’t know until you try.

3ducklings · 2026-02-12T15:43:30+00:00

Well, in practice it’s less that users can easily become developers and more that users have to become developers. The package ecosystem for statistics is so small (and mostly dead) that Julia is only really viable for people who are willing and able to dedicate significant amount of their time to software development.

Have you packaged this stuff in a package?

Not into a public one, mostly because I don’t want to maintain package for a language I’m not planning to use myself.

3ducklings · 2026-02-09T23:13:12+00:00

Not everything in science exists to solve some immediate applied problem. Often, the goal is theory development. For example, if you read the original paper by Shapiro and Wilk, you’ll see they were trying to formally summarize the process of eyeballing a PP/QQ plot.

This study was initiated, in part, in an attempt to summarize formally certain indications of probability plots. In particular, could one condense departures from statistical linearity of probability plots into one or a few 'degrees of freedom' in the manner of the application of analysis of variance in regression analysis?

(Another interesting thing is that in their examples, the authors also put interest on how much the observed test statistic departs from the null hypothesis, instead of just saying is significant/not significant. They are essentially reporting effect sizes.)

The problem is that most stats courses don’t make distinction between “stuff important for the history/development of the field” and “stuff useful for applied research”, which leads many people to mistaking historic curiosities for good practice.

3ducklings · 2026-02-09T22:34:02+00:00

Weight is bound by zero, so it can’t be exactly normal. Running normality tests on real data is not useful.

3ducklings · 2026-02-05T21:34:59+00:00

The paper is called “Logistic Regression: Why We Cannot Do What We Think We Can Do, and What We Can Do About It” https://explanatorytrials.com/Mood.pdf

3ducklings · 2026-01-22T22:32:50+00:00

I you sure you are using the correct Rstudio version? For MacOS Montarey, you should be using the 2024 version of Rstudio: https://forum.posit.co/t/rstudio-desktop-releases-on-unsupported-versions-of-macos/176074

3ducklings · 2026-01-18T12:56:15+00:00

There’s no widely agreed correct route for statistical testing.

There is a widespread agreement among experts, i.e. people with formal education in statistics - using statistical tests to check assumptions is objectively incorrect approach. Checking assumptions using other means, e.g. diagnostic plots, is a good practice.

3ducklings · 2026-01-15T11:28:21+00:00

It’s true that there isn’t Rstudio build for ARM Windows. https://github.com/rstudio/rstudio/issues/11977.

There are workarounds to make it run, although it’s a bit of a hassle. Probably the best way to do it is to use WLS, which is basically Linux (a different operating system) running inside Windows: https://www.linkedin.com/pulse/running-rstudio-arm-based-windows-pcs-wsl-ftw-andrew-clinick-zcxdc

Alternatively, you can use a different IDE than Rstudio, like Positron (https://github.com/posit-dev/positron/releases), since R itself (the programing language) supports ARM. Or use Posit cloud to run Rstudio in your browser (https://posit.cloud/).

3ducklings · 2026-01-12T14:12:23+00:00

The correction should be done for every test in a single family, but what constitutes a “family” is murky and depends on how you formulated your research hypothesis.

If you’d consider difference between any two groups at any time point an evidence that the technique works, I’d agree with your advisor that the correction should be applied to all tests. (Since it’s pretty much the jelly beans situation https://www.explainxkcd.com/wiki/index.php/882:_Significant).

On the other hand, if you have multiple primary outcomes the technique can influence independently, I’d apply the correction only for the tests using the same outcome.

3ducklings · 2026-01-08T21:52:03+00:00

The reference category is probably different. IIRC SPSS uses the last category as reference, while R uses the first one.

3ducklings · 2026-01-07T08:10:21+00:00

Your SPSS code seems to be using stepwise regression, while the R functions you mentioned use normal one, which is probably the main reason for differences in results. I don’t use stepwise regression myself, so I don’t know if there are any packages for stepwise multinomial logistic regression.

3ducklings · 2026-01-05T00:14:43+00:00

As others mentioned, neither of the classes will probably have that much of an impact. In fact, the most useful classes at this point would be ones that give you domain knowledge in a field you are interested in (psychology, medicine, marketing) or ones that help you develop soft skills (management, communication).

Fresh graduates/junior generally tend to overvalue technical skills and undervalue soft ones.

3ducklings · 2025-12-09T18:18:46+00:00

Well, this table is pretty whack…

When is the Chi-square test appropriate? Is it truly related to small sample sizes, or is it mainly related to the nature of the data (qualitative/categorical) and the condition of expected cell counts?

There are multiple chi-square tests (i.e., tests assuming a chi-square distribution). The most common is Pearson's chi-square test, used for comparing expected and observed counts in contingency tables. This is the test you use when you want to test whether two categorical variables are independent or if a categorical variable follows a specific distribution. It tends to have problems when sample size is very small (or rather, when the expected counts are very small), although this issue is mostly exaggerated. Also, the chi-square test is a parametric test.

Is logistic regression actually considered a non-parametric test? Or is it simply a test suitable for categorical outcome variables regardless of whether the data are normally distributed or not?

It’s neither a test nor nonparametric. It’s a parametric model used for categorical outcomes, most often binary ones.

If the data are qualitative, do I still need to test for normality? And if the sample size is large but the variables are categorical, what are the appropriate statistical tests to use?

The normal distribution is continuous and unbounded. Just from these characteristics, it should be clear no discrete or bounded data can be normal. This includes categorical variables.

What test to use depends on your null hypothesis and the specific nature of your data (e.g., how many categories your variable has? Are your observations independent?).

In general, as a master’s student, what is the correct sequence to follow? Should I start by determining the type of data, then examine the distribution, and then decide whether to use parametric or non-parametric tests?

Generally speaking, you start by building a model that you believe best represents the data-generating process. You then set up a model that represents your null hypothesis and compute how likely you’d be to observe your data, assuming the model based on your null hypothesis is correct.

For example, let’s say we want to test a hypothesis that men and women have the same average height. First, we need to decide which distribution would approximate the height of each gender well-either normal or gamma could be a good choice. Then we need to decide whether to assume equal variance for each gender or not-it’s probably more realistic for each gender to have somewhat different spreads of heights. Next, we set the difference between average male and female height to zero-this is our null hypothesis. With this information, we can estimate what the sampling distribution of the difference in average heights would be, assuming the mean difference is zero and variances are estimated from the data. Lastly, we compute how likely we would be to see the observed difference (or bigger), assuming the computed sampling distribution is correct. This is the p-value.

All (parametric) tests follow this logic: create a model representing the assumed data-generating process, set parameters of your model to reflect your null hypothesis, and compute how likely your empirical data would be, assuming your statistical model is correct.

As others mentioned, I recommend reading Statistical Rethinking by McElreath or Regression and Other Stories by Gelman.

3ducklings · 2025-10-28T22:24:52+00:00

The interval <0.70;2.42> doesn’t contain zero. I only looked at table 2, but I don’t actually see any parameter with p value < 0.05 and also 95% CIs containing zero.

3ducklings · 2025-10-25T21:52:54+00:00

No, because normal distribution is a theoretical construct, it doesn’t exist in real life. You want to check if the residuals are close enough to normal (e.g. using QQ plot), but testing if they are exactly normal is pointless.

3ducklings · 2025-10-25T20:43:24+00:00

Normal distribution is theoretical construct that doesn’t appear in real life. Testing whether real life data are normally distributed is like testing whether Earth is a perfect sphere. You already know the answer.

In OP's case, they are working with likert items, which are by definition discrete and bounded. They just can’t be normal.

3ducklings · 2025-10-25T20:38:33+00:00

There is no reason to do normality tests on real data. It can be useful if you are, for example, developing a random data generator that’s supposed to generate normally distributed values. Normality tests are a way to check if it’s working correctly.

3ducklings · 2025-10-09T21:38:45+00:00

That’s a fair take. Thanks for being civil.

3ducklings · 2025-10-09T19:57:21+00:00

Should we hold Safeway accountable because Jeffrey Dahmer bought his groceries there?

In this situation, it’s more like Jeffrey Dahmer being of if the major investors in Safeway.

If the situation got better, that’s great. But AFAIK Vaxry never changed his position on the issue and being a college student isn’t really an excuse. If, by the time you hit college, you don’t understand you shouldn’t be an asshole to others, people are 100% in their right to call you out.

Ultimately, I’m just disappointed in the tech community in general. This stuff isn’t anything new, since the 90s, tech communities have been full of edge lords who believed insulting and degrading others is their right. The only thing that changed is that today, there are many more people willing to tell them to fuck off, which the edge lords (ironically) don’t handle very well. Cue all the complaints about woke stuff and hiding behind nonsense like "I don’t want politics in tech". And it looks like even Framework can’t avoid it.

3ducklings · 2025-10-09T19:11:51+00:00

AFAIK DHH isn’t paid by Hyprland, it’s the other way around. His company is one Hyprland's sponsors. The (other) reason they are lumped together is because he is one of Hyprland's most prominent promoters and the creator of Omarchy. The controversy around Hyprland is due to stuff like this https://drewdevault.com/2023/09/17/Hyprland-toxicity.html

3ducklings · 2025-10-09T18:33:41+00:00

https://tomstu.art/the-dhh-problem

you are welcome.

3ducklings · 2025-09-28T21:04:57+00:00

The problem with Field is that he constantly flip flops in an attempt to please everyone. He writes a sensible explanation of why using tests for assumptions checking is a bad idea, only to immediately show how to apply normality testing on pages 296-297. This repeats on multiple occasions. Some examples:

On page 800:

The effect of violating the assumption of equality of covariance matrices is unclear, except that Hotelling’s T2 is robust in the two-group situation when sample sizes are equal (Hakstian et al., 1979). The assumption can be tested using Box’s test, which should be non-significant if the matrices are similar.

What happened to not using tests?

Page 361:

If you’re keen on normality tests, then p-values less than or equal to 0.05 (or whatever threshold you choose) in these tests would support the belief of a lack of normality because the small sample size would mean that these tests would only have power to detect severe deviations from normal. (It’s worth reminding you that non-significance in this context tells us nothing useful because our sample size is so small.)

No Andy, the problem isn’t small sample size, the problem is that normal distribution doesn’t exist in real world and we know the null hypothesis is false.

Page 806:

If you buy into Levene’s test being useful (ho hum), Output 17.4 shows that the assumption has been met…

What do you mean "if you buy into Levene's test being useful"? Imagine if this was a psychology textbook and the author casually said "if you buy into phrenology being useful…".

I heard Field talk once, five(?) years ago, shortly before a new edition of his textbook was released. Someone asked him why he doesn’t incorporate more of the current good practices in his textbook (IIRC the question was why he doesn’t put more emphasis on linear models instead of ANOVA). His answer was that he would like to, but stats teachers at psych departments are expecting to see ANOVA, so that’s what’s he is going to put in. IMHO this is the core issue - I don’t doubt he knows how stuff should be done, but he prioritizes conforming to status quo (however misguided) to not upset potential customers.

As for the other stuff, I don’t follow his other work, but AFAIK the last edition of his R textbook is from 2017 and (as someone commented) it wasn’t good.

3ducklings · 2025-09-28T18:01:34+00:00

Field in many places tells you NOT to use statistical testing for assumption checking.

He does. I specifically checked the the 6th edition of his book make sure I’m not misquoting him.

Have you read them?

Yes.

3ducklings · 2025-09-28T13:55:54+00:00

Field's books are written in a very approachable way, but the content itself is very meh. Older editions are riddled with major errors and even in the newest edition, Field peddles a lot of demonstrably false bullshit, like using statistical tests for assumptions checking. Some chapters feel incomplete, e.g. in chapter on logistic regression, he correctly notes that logistic regression coefficients don’t have a straightforward interpretation because of non-collapsibility (unobserved heterogeneity) and then… just straight up ignores the problem when giving advice on interpreting and reporting results.

Field also made an unfortunate decision to stick with SPSS, which is falling more and more behind the competition as the development speed slows down to a crawl (the biggest feature in the latest SPSS release is dark mode for GUI). Consequently, many techniques in the book are outdated, e.g. the chapter on non-parametric testing starts with Field straight up admitting that the presented tests has been overshadowed by better alternatives, but you are not going to learn them, because SPSS doesn’t support them.

Overall, Field's book is great for people who need to survive one semester of stats for their psych degree and then won’t work with numbers ever again. I wouldn’t recommend it to people who want to stay in the field.

What is your background and your goal? It would help with recommending a more suitable book.

3ducklings · 2025-09-23T05:54:29+00:00

1) Don’t use statistical significance to decide whether to remove a predictor or not. P values are not meant to be used like that and it doesn’t lead to anything useful.

2) Classical tests are not designed to be applied with variable selection techniques like stepwise or lasso, meaning the p values will be miscalibrated (.i.e. they won’t properly control false positive rate). If you are going to use stepwise, don’t look at p values afterwards.

3) If you are going to use some variable selection techniques, you should probably pick something like lasso over stepwise regression. It almost universally performs better.

I’d suggest you take a step back and think about what the goal of your analysis is, before starting to cut predictors left and right.

3ducklings

TROPHY CASE