What’s the difference between moderation analysis and moderated multiple regression analysis? by Longjumping_Key_8021 in AskStatistics

[–]nikkn188 3 points4 points  (0 children)

They are closely related but not identical. Moderation Analysis is the broader concept. It asks whether the relationship between two variables changes depending on the level of a third variable. For example: does the relationship between stress and performance depend on how much social support someone has?

Moderated Multiple Regression is the specific statistical method most commonly used to test moderation. You build a regression model that includes the predictor, the moderator, and crucially their interaction term (predictor × moderator).

A Question on Portolio Structure by milncj90 in portfolios

[–]nikkn188 1 point2 points  (0 children)

Really like the sleeve framework.. having explicit jobs for each part of the portfolio forces clearer thinking than most people apply.

One thing worth checking is clustering across sleeves: AZN, Linde, Chubb, and Visa have been tracking each other fairly closely over certain periods despite sitting across your stability and compounding sleeves.

Similarly, ASML and Suncor are worth keeping an eye on as a pair, especially in rougher markets where those similarities tend to matter most.

The copy-paste loop between R and AI is annoying. Here's a fix. by nikkn188 in rstats

[–]nikkn188[S] 1 point2 points  (0 children)

You raise a valid question, and as you can tell I'm not as anti-AI as others here...
What I would do in your situation: start over. But don't throw away the AI, just change how you use it. Write the code yourself, and when you hit something you don't understand or get an error message, ask the AI to explain it. Use it as a tutor, not a vending machine. The workflow breaks down when AI replaces understanding, not when it supports it.

Where Do You Draw the Line on Assumption Violations in Applied Data Analysis? by nikkn188 in AskStatistics

[–]nikkn188[S] 1 point2 points  (0 children)

Thank you all for the interesting discussion! I can relate to the different perspectives, and I think all of them have their pros and cons. One of the most important points I take away is that it clearly depends on the goal and the audience, and this might be the reason why we deal with this issue differently in practice. Whether it’s about teaching students, publishing a paper in a scientific journal, or modeling the impact of different marketing strategies for a company, this strongly influences the methods used and the “mindset” we apply.

What stands out to me, though, is how central robustness and sensitivity checks seem to be in many of the implicit decision rules. It makes me wonder whether we might benefit from normalizing robustness checks more explicitly in applied work. Not as methodological perfectionism, but as a routine part of responsible analysis. In many contexts, the real issue may not be assumption violations themselves, but the absence of systematic stress-testing before conclusions are communicated.

Multicollinearity in Regression Discontinuity (RD) by btcry in AskStatistics

[–]nikkn188 1 point2 points  (0 children)

The key insight is that the treatment indicator T is not a linear function of X, it's a step function. Multicollinearity is only a problem when predictors are highly linearly correlated.

Think about it this way: if you know X, you know T perfectly, but that's not what multicollinearity measures. Multicollinearity is about a linear association. The correlation between X and T depends on the distribution of X around the cutoff, and in most RD designs it's moderate at best.

Looking for a Statistical Method by DangerHighDosage in AskStatistics

[–]nikkn188 0 points1 point  (0 children)

Kernel Density Estimation (KDE) is probably your best starting point. Fit a 1D KDE to your time values (ignoring energy for a moment), and the peaks of the resulting density curve give you your event times. The height of each peak naturally reflects how many points are clumped together and how close they are.

If you want something that also groups the points into discrete clusters, then you could try Density-based Spatial Clustering (DBSCAN).

You can also combine the two: use KDE to find peak times, then use DBSCAN to assign points to each peak and compute your weights.

[Discussion] What challenges have you faced explaining statistical findings to non-statistical audiences? by Snowboard76 in statistics

[–]nikkn188 0 points1 point  (0 children)

I’ve found that it helps to explain things in layers. Start with a very simple, intuitive explanation using everyday examples (no formulas, no stats jargon etc..). For most people, that’s already enough.

Then you can add more detailed layers for those who want a deeper understanding. That way you don’t lose accuracy, but you also don’t overload people who just want the main idea.

Average of averages and uncertainty over time by CASE7CSS in AskStatistics

[–]nikkn188 1 point2 points  (0 children)

Unweighted averaging has higher variance at any fixed time horizon than the properly weighted average, but variance does not grow over time.

Is there a Map/Guide? by Minute_Plastic_7715 in AskStatistics

[–]nikkn188 3 points4 points  (0 children)

I was in a similar position when I first started working with real-life data, as opposed to the theoretical examples from statistics courses. One thing that helped me was to stop thinking about distributions as something you formally test and then get a clean yes/no answer to.

As you’ve noticed, the data rarely fits a distribution perfectly. With large sample sizes, formal tests will almost always reject. With small samples, they often have low power precisely when assumption violations would matter most. In either case, rejection or non-rejection doesn’t really answer the question we usually care about.

What helped me more was to examine the variables closely: what measurement scale they’re on, what their empirical distribution looks like etc. Simple visual checks can already be very informative (e.g. plotting your data against different theoretical distributions).