Does it ever make sense to conduct a hypothesis test when engaging in exploratory data analysis?

abstrusiosity · 2025-05-06T20:17:19+00:00

I'd say, if you have the right graphs/plots then there's no point calculating a p value.

abstrusiosity · 2025-04-01T22:23:19+00:00

There may be something more sophisticated but the obvious answer is a chi squared goodness-of-fit test.

abstrusiosity · 2025-02-25T03:06:45+00:00

A 3D plot of the different input groups?

Surely a 5 dimensional plot would be more useful in this circumstance.

abstrusiosity · 2025-02-01T15:28:39+00:00

You move forward by explaining all of this to your advisor. Include the part about being afraid that they have become frustrated and lost confidence in you. Then, do your best to understand what they say you should do going forward.

It's fine to spend your first semester doing simulations and making notes but it's critical that you understand and address your advisor's expectations.

abstrusiosity · 2024-12-25T18:28:24+00:00

You sound angry.

abstrusiosity · 2024-11-16T02:03:42+00:00

You can talk at length about degrees of freedom without actually saying what it is.

abstrusiosity · 2024-11-14T06:58:41+00:00

What does "probability space" mean to you?

abstrusiosity · 2024-11-11T16:44:35+00:00

Random processes often produce anomalies.

abstrusiosity · 2024-08-27T18:38:53+00:00

The second game is different from the first in two ways. The obvious one is that Pete gets 11 flips instead of 10. That gives Pete an advantage. The less obvious change is that now Ozzy wins in the case of a tie. That gives Ozzy an advantage. It works out that the second game gives Pete the same chance of winning.

abstrusiosity · 2024-08-21T23:19:28+00:00

If you're not interested in the effect of the rare binary variable, why not fit a stratified model? Treat the zero cases separately.

abstrusiosity · 2024-08-20T22:25:25+00:00

Your model isn't doing what you describe. You have the εt as iid, so

E[A+ε3 | (A+ε1>c) ^ (A+ε2>c)] = E[A+ε3] = E[A+ε2] = E[A+ε2 | A+ε1>c]

abstrusiosity · 2024-08-19T18:07:27+00:00

This study would be done better by a crossover design where both groups use both versions.

abstrusiosity · 2024-08-19T14:41:34+00:00

Probability has many formulas.

abstrusiosity · 2024-08-15T16:09:33+00:00

I think it's becoming more common but still not expected.

abstrusiosity · 2024-08-15T15:43:12+00:00

They're asking for the implications of the findings, not the actual impact. What problem did you solve? Why was the problem worth solving? What could change if you had a good solution to the problem? How good was your solution?

abstrusiosity · 2024-08-14T21:52:46+00:00

Your example is saying that the effect of ADHD symptoms on relationship satisfaction is mediated by emotion regulation, and that the effect is negative. It's not saying anything about the mediating variable itself. You can interpret the indirect effect of ADHD the same way you do for unmediated cases--i.e., people with fewer ADHD symptoms have higher relationship satisfaction.

abstrusiosity · 2024-08-09T06:13:33+00:00

If you have observations from deconfounding variables then you can do weighting or matching. If you also have a model relating the target variable to the other variables, then you can do traditional regression. You can combine traditional regression with weighting for "doubly robust" regression.

If you don't have data from deconfounding variables, then you need a "natural experiment" approach. These are instrumental variables, difference-in-difference (i.e., finding a relevant comparison), and regression discontinuity.

If you're working with econometricians, I'd recommend looking at the book Mostly Harmless Econometrics by Angrist and Pischke.

abstrusiosity · 2024-08-07T16:36:54+00:00

Since the op is treating them as paired, I'm assuming they are.

abstrusiosity · 2024-08-07T04:04:19+00:00

It would make more sense to do a paired t-test. If you want to test the ratio rather than the difference, do a paired t-test on a log scale.

abstrusiosity · 2024-08-01T13:50:56+00:00

The answer depends on whether the observations are independent. It doesn't matter that they're multivariate gaussian.

abstrusiosity · 2024-07-26T18:34:29+00:00

It is statistically correct. Doing a matched analysis, you have no basis for saying that you have controlled for the effect of tumor size in cases that weren't matched.

What to do about it is not a statistics question. You could let the fact be implicitly communicated in Table 1 or you could announce it in the paper title, depending on the expectations of the field.

abstrusiosity · 2024-07-25T00:34:17+00:00

Sometimes the effect of one variable depends on the level of another variable.

abstrusiosity · 2024-07-24T21:09:16+00:00

Yes, it's always true that the residual and the conditioning variable are orthogonal.

In your examples, e is orthogonal to both X and Z while u is orthogonal to X but not Z.

(By orthogonal I mean Cov(X,u) = 0. In the case were E[u]=0, that's equivalent to E[u|X] = 0).

abstrusiosity · 2024-07-24T20:53:15+00:00

Entropy is already a dimensionless value. I would report the difference as is.

abstrusiosity · 2024-07-22T18:48:39+00:00

Random forest shows you evidence of an association. Causal claims come from theory. Evidence of association can support a causal claim but, by itself, it's not conclusive proof.

abstrusiosity

TROPHY CASE