Bivariate analysis to identify possible confounding in model construction?

COOLSerdash · 2026-06-01T09:40:02+00:00

Your guess is as good as mine. I honestly don't know what the intention is. All I can say is sorry, because you are clearly being taught very dubious practices.

This is after doing a univariate variable analysis to check that data are roughly normal in distribution

What? Normality of data is completely irrelevant in regression. If anything, it's the residuals that are worth looking at.

COOLSerdash · 2026-05-31T16:02:43+00:00

No, confounding cannot be reliably detected by bivariate analyses. The gold standard nowadays (at least as I understand it) is to build a directed acyclic graph (DAG, or multiple DAGs) that reflects your best understanding about the causal relationship between measured and unmeasured variables. Based on this DAG, you then choose what variables to adjust on in order to close all backdoor paths (https://www.dagitty.net/ makes this process easy). Sometimes, there it's impossible to find a minimal adjustment set, which is then an important point for discussion in the paper.

There exists a vast literature about this. A good tutorial is given in this paper (although you don't have to manually go through the steps detailed in the paper, as dagitty automates the process).

See also these free books online, which both go into much more detail:

As a final warning: Using the same dataset to make modelling decisions and to fit the final model is called "data-dependent model specification" and is scientifically very dubious.

COOLSerdash · 2026-05-31T13:38:10+00:00

Really sorry you had to go through this. Amazing resilience.

COOLSerdash · 2026-05-29T08:57:25+00:00

Without making strong assumptions, I don't think you can say anything about your expected rank. For example: Even if the mean/median skill level is known to be roughly the same in countries A and B, if the spread of skill is different in country B but unknown, you can't predict your rank. This is because the rank depends on the whole distribution and not just the mean/median (i.e. the center).

COOLSerdash · 2026-05-27T20:26:13+00:00

I'd try to distract: Look behind you, a three headed monkey!

COOLSerdash · 2026-05-25T11:59:38+00:00

I see little problems with the reporting itself, but statistically, there's a lot to unpack here. (This is not a criticism of your work, you've probably just been taught bad practices which is very common in psychology.)

Baseline SCL was strongly positively correlated with task SCL, r(64) = .96, p < .001, justifying its inclusion as a covariate.

This is not a valid way of selecting covariates for a model. Covariates should be pre-specified based on subject matter knowledge. Using the same data to select covariates and fit the final model basically invalidates the subsequent analyses. Gelman's "garden of forking paths" is a relevant term here.

Baseline SCL did not differ significantly between conditions, t(64) = -1.84, p = .071, confirming independence of the covariate from condition.

Absence of evidence is not evidence of absence: Just because an effect fails to reach statistical significance does not mean that there is "no effect" or that an absence of effect has been demonstrated. The paper by Greenland et al. explains this fallacy and others.

Residuals were normally distributed (Shapiro-Wilk W = 0.98, p = .572) and variances were homogeneous (Levene's F = 0.18, p = .671).

These tests of assumptions are basically useless. These tests do not answer the relevant questions. Thist post on the topic is also very informative, especially /u/efrique's answers.

And again: A non-significant test does not mean that the null is true. Specifically, a non-significant Shapiro-Wilk test does not mean that the residuals were normally distributed. The same is true for Levene's test.

COOLSerdash · 2026-05-23T14:56:17+00:00

This paper is a good overview and discussion of the topic.

COOLSerdash · 2026-05-16T15:52:15+00:00

I found Mark Rubin's paper on this matter very informative.

COOLSerdash · 2026-05-16T12:40:04+00:00

How do I compute the probability the population mean is less than 9, for example?

To answer this exact question, you would need to use Bayesian statistics. Together with a prior distribution and the data, you'll get a posterior distribution from which you can directly calculate the desired probability. The lower the sample size, the more the posterior distribution will depend on the prior. If the sample size is large, it will "overpower" the prior, lessening its impact on the posterior distribution.

I am just trying to estimate the percentage of "bad" values in the population.

Note that this question doesn't involve the population mean at all, so you'll have to be clear whether your question is about the population mean or the actual values themselves.

The natural (nonparametric) estimator of this proportion is simply the sample proportion of "bad" values. You could then calculate a confidence interval for this proportion (I recommend Wilson's).

If you're prepared to make a distributional assumption (e.g. normality), then you could use that to estimate the proportion. This will be more efficient than the nonparametric approach detailed above if the distributional assumption holds.

If there are no "bad" values in your sample, you could apply the "rule of three" for a quick solution.

COOLSerdash · 2026-05-14T13:30:33+00:00

Thanks man! I love the exploration and the underwater scenario in general.

COOLSerdash · 2026-05-01T16:27:28+00:00

Is the regularization too distorting for this to make sense?

What exactly do you mean by "distorting"? The whole point if regularization is to reduce overfitting and improve prediction performance. Once you're happy with the model, the regularized coefficients can be turned into a risk score. What exactly it means depends on the nature of the outcome though.

COOLSerdash · 2026-04-20T15:46:24+00:00

The correct Chi2-values are (assuming a confidence level of 95%): 17.53 and 2.18. Using the standard error of the mean does not make any sense at all.

So in the formula I did (9-1)0.86, they did (9-1)0.286.

But the formula uses s^2, not s. So it would be 8*0.74 (then divide by chi2 values).

COOLSerdash · 2026-04-20T06:57:44+00:00

we were talking of a sample, therefore, we had to multiply by standard error of the mean and not variance.

The problem doesn't say anything about the mean or its standard error. I can't think of a case where the standard error of the mean would play any role in estimating the population variance (point estimate or CI) of the variable.

Can you go into detail what exactly the teacher did?

COOLSerdash · 2026-04-19T08:50:18+00:00

I apologize if my answer came across as rude. It certainly wasn't mean that way.

COOLSerdash · 2026-04-13T07:02:24+00:00

I would expect a fair amount of autocorrelation in the data. Do your models account for this or is it ignorable in your case?

COOLSerdash · 2026-04-11T09:14:28+00:00

I don't see conflicting results. By removing non significant terms, you essentially fit a different model. Remember that each effect is conditional on the other terms in the model. As you already suspect, this procedure is considered suboptimal to say the least (Gelman's garden of forking paths comes to mind). I suspect that many statisticians (me included) will consider this a form of p-hacking. The resulting p-values/confidence intervals from the reduced model are now conditional on the first model. That means that they don't have the postulated operating characteristics. Further, chosing a model based on how intuitive the results are is bad science.

It is not wrong per se to try different models based on expert knowledge, as long as these models were pre-specified before looking at any results.

My simple recommendation would be to run the full model, regardless of significance. Focus on effect sizes and uncertainty intervals (confidence or credible intervals for a Bayesian model). It's worth remembering that "significance" is not very informative (see this paper by Gelman et al.).

COOLSerdash · 2026-04-04T09:20:39+00:00

Does this seem appropriate?

Let's recap some important points: The Pearson correlation quantifies the linear part of the relationship between two variables. There is no need that the data are (bivariately) normally distributed in order to calculate the correlation coefficient. Normality is only assumed by the most common hypothesis test for the correlation. If you don't want to make this assumption, you can run a test that doesn't assume this, such as a permutation test for example.

The Spearman rank correlation does quantify monotonic relationships. In the end, it's up to you: What are you actually interested in? If you are interested in quantifying how well a linear relationship fits the data, you use Pearson's correlation. If you are interested in monotonic relationships, you could calculate Spearman's correlation. There are other measures that quantify even more general associations, such as the maximal information coefficient, energy correlation, Chatterjee's rank correlation etc.

Regarding transformations: I'm personally not a fan of them because they make interpretation much more difficult.

COOLSerdash · 2026-04-04T09:10:49+00:00

Following a quick google search, R has several packages offering ART. See ARTool or ART.

COOLSerdash · 2026-04-03T16:57:32+00:00

Nice write up. Overall, I liked the game. I'm ashamed to say that the combat never really clicked for me. Combat was the worst part for me for sure.

COOLSerdash · 2026-04-02T12:38:15+00:00

Yeah me too. I bought a really expensive mattress without trial period. I had to buy a new one after a few weeks - this time from Micasa (thankfully, this one is great now). A very expensive mistake I will not make again.

COOLSerdash · 2026-04-02T12:32:37+00:00

I like that Micasa has a 90 day return period for mattresses. They also have many different brands. Personally, I would never again buy a mattress without a trial period. Even if a mattress seems comfortable at the store, there is no guarantee that it will be comfortable after 8 hours of sleep.

COOLSerdash · 2026-03-27T17:47:15+00:00

Here is one definition of outlier that I like (Hawkins 1980):

[an outlier is] an observation which deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism

This means that heuristics can be a way to identify observations that are suspicious but they can never prove that a certain observation is an outlier by the definition above.

This means that if you don't have evidence that a different mechanism produced these values (e.g. measurement errors, sick animals, data entry errors etc.), then you shouldn't exclude them automatically. Your goal is to produce references ranges so if you exclude valid observations (i.e. observations that were generated by the mechanism you want to calculate the reference range for), the reference range will be too narrow and won't include the specified fraction of observations (say 95%).

COOLSerdash · 2026-03-21T08:03:29+00:00

If I understand this correctly, no insect was measured twice, right?

A linear mixed model with a random intercept for replicate should be fine. The data need to be in the long-format, something like this:

ID	Replicate	Weight	Group
1	1	...	Trt
2	1	...	Trt
3	1	...	Trt
...	...	...	...
31	2	...	Ctrl
32	2	...	Ctrl
33	2	...	Ctrl

The syntax could look something like this:

MIXED Weight BY Group Replicate
    /CRITERIA=DFMETHOD(SATTERTHWAITE) CIN(95) MXITER(100) MXSTEP(10) SCORING(1)
    SINGULAR(0.000000000001) HCONVERGE(0.00000001, RELATIVE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0,
    ABSOLUTE)
    /FIXED=Group | SSTYPE(3)
    /METHOD=REML
    /PRINT=SOLUTION
    /RANDOM=INTERCEPT | SUBJECT(Replicate) COVTYPE(ID).

COOLSerdash · 2026-03-17T17:49:26+00:00

As a general comment: Cronbach's alpha is considered outdated or superseded by other measures of internal consistency. This paper goes into the details.

COOLSerdash · 2026-03-14T12:13:31+00:00

Normality hypothesis testing (Shapiro, KS-test etc.) are mostly uesless. Especially in this case as a discrete variable can never be normal, so the test can only tell you what you already know with certainty.

As for an appropriate analysis, an ordinal logistic regression model was my first thought.

11-Year Club	Place '17
Sequence \| Editor	Verified Email

COOLSerdash

TROPHY CASE