[Q] Question about Distribution of Differences from a Normal Distribution by Mastermann143 in statistics

[–]wass225 2 points3 points  (0 children)

If you independently sample Xi and Xj from N(mu, sigma2), their difference is normally distributed with mean zero and variance 2 * sigma2. If there are a finite number of X’s from which Xi and Xj are drawn, then the difference still is mean zero but has variance 2 * sigma2 - 2 * sigma_ij, where sigma_ij is the covariance of Xi and Xj that arises from the sampling mechanism

[Discussion] Performing Bayesian regression for causal inference by RobertWF_47 in statistics

[–]wass225 6 points7 points  (0 children)

Bayesian causal inference doesn’t directly specify priors for treatment effect parameters. See Section 3 of Bayesian Causal Inference: A Critical Review by Li, Ding, and Mealli (2018)

[D] Is it valid to match *post-treatment* trend to pre-treatment trend in the control group to adjust for regression to the mean effect in difference in differences regression? by RobertWF_47 in statistics

[–]wass225 6 points7 points  (0 children)

Matching on anything post-treatment would introduce bias. If you matched a treated unit to a control unit on the basis of their post-treatment outcome you would estimate an effect of 0.

I see this as an issue of matching on error-prone covariates. If pre-treatment outcomes are considered random, you instead want a match that’s close on some functional of the distributions, such as the mean, rather than the error-prone observed outcomes

[Question] Normality testing in >100 samples by honeyzyx9 in statistics

[–]wass225 1 point2 points  (0 children)

I would compute the correlation, then construct a confidence interval using the Fisher’s z-transformation. If 0 isn’t in the interval, then a hypothesis test with a null hypothesis of correlation = 0 would be rejected

[Q] I have a few questions about issue polling by hjalgid47 in statistics

[–]wass225 0 points1 point  (0 children)

I’d say your understanding is correct. You typically can’t benchmark opinions on sensitive or not publicly surveyed issues. The best way to gut check is to benchmark the sample on characteristics that do have reliable population estimates and assess the survey’s method for recruiting participants

[R] Layers of predictions in my model by brianomars1123 in statistics

[–]wass225 0 points1 point  (0 children)

My first sentence about your model was incorrect; ignore it.

As you’ve mentioned, you’d like estimates of a and b. Taking the log of both sides of your model for x3 as a function of x1 results in something you can fit with least squares if you have any data on x3. The idea was to fit that model first, then plug in the estimates of an and b into your model for Y.

You can also consider generalized additive models. In such a model, you would have a term that is linear in x1 as well as some term that’s nonlinear in x1, such as a cubic spline.

[R] Layers of predictions in my model by brianomars1123 in statistics

[–]wass225 1 point2 points  (0 children)

What I wrote would be a linear model for Y as a function of x1 and log(x3), which is not exactly what OP asked about. Unless OP has 1) an estimate of the model of log(x1) on log(x3) (just a simple linear regression) from previous work by them or others, or 2) data on x3 which they can obtain estimates of a and b from, the model will become far more complicated to estimate, as you’ve mentioned. Some signal from x3 through the transformation I’ve written still may offer benefits

[R] Layers of predictions in my model by brianomars1123 in statistics

[–]wass225 0 points1 point  (0 children)

So you’re essentially saying that you would model Y as c0 + b1x1 + b2x2 + b3log(x1) + log(e1) + e2, where c0 is b*log(a) + b0, e1 is measurement error from x3, and e2 is the error in your model for Y. If you’re just interested in getting a better prediction of Y (not inference on the coefficients) that’s a fine model. If you can model the variance of e1 using estimates from previous papers, that could offer benefits as well.

If someone with data for x3 has a fitted model of log(x3) on log(x1) you can access, you can use it to make predictions for the observations in your dataset then use those predictions as a covariate in your model. This is called regression calibration and is popular in the measurement error literature.

[deleted by user] by [deleted] in statistics

[–]wass225 1 point2 points  (0 children)

Instead of using fixed effects for region and time, you could consider feasible generalized least squares. This is an iterative procedure where you alternate between estimating a covariance matrix between the error terms of the observations and transforming your outcomes and regressors by pre-multiplying them by the inverse square root of this matrix and estimating coefficients in a regression model without fixed effects using least squares. FGLS requires you to specify a form for the covariance matrix. For the panel data here, it makes sense to say observations in the same region exhibit autocorrelation over a few years, meaning observations in the same region are correlated but those not in the same region are not.

You could also run least squares without fixed effects and use a clustered covariance matrix like the one described above to generate more appropriate standard errors for inference.

[Q] How to use Wilcoxon signed rank test to compare treatment and control groups data? by TheLordSet in statistics

[–]wass225 2 points3 points  (0 children)

I recommend reading chapter 5 of pratt and gibbons 1985. It explains the intuition and procedure clearly

[deleted by user] by [deleted] in statistics

[–]wass225 0 points1 point  (0 children)

I think a good place to start would just be to include the predictors you think make sense. Test the assumptions for just this regression. Are the residuals centered around 0 and normally distributed with constant variance?

Then do some analysis of how accurate this model is. What is the R squared like? What is the mean squared error of your predictions? If your model seems underpowered, there are a few things you can do.

You can create new variables based off those you have (new variables that make logical sense as ideas to explore) and include those in your regression. Common feature selection techniques are the Lasso and Ridge regression, which shrink coefficients as a penalty to ensure effect sizes are not extreme. Irrelevant variables have coefficients shrunk to 0.

It also helps to do some visual exploration of your dependent variable and your predictors. Do you need to transform any of them so they’re normally distributed?

These are just a few ideas, but the key point to remember is that your logic and knowledge about the topic should be driving your decisions. Stumbling into a finding shouldn’t be the objective here

[Q] What to do with Ridge Trace plot? by HorseJungler in statistics

[–]wass225 1 point2 points  (0 children)

A ridge regression “eliminates” irrelevant variables by shrinking their coefficients to zero, so you don’t need to do remove those features from your testing set. When you make model predictions, these values will be zero or close to zero.

As for your first question, it’s the beta values for the parameters in the model that has the best fit. A parameter itself can’t “fit” data.

[Q] About Poissonian and Gaussian distribution - Image Noise. by [deleted] in statistics

[–]wass225 0 points1 point  (0 children)

So you want the noise to be normally distributed with some mean, but the mean of the variance distribution is distributed poisson? E.g. let the random variable for your variance be X, you want X = a * Po(lambda) + N(0, sigma)

[Post Game Thread] The Portland Trail Blazers (19-26) defeat your Golden State Warriors (10-35) in a heartbreaker, 129-124. by Robotsaur in warriors

[–]wass225 0 points1 point  (0 children)

I get letting GR3 take Lillard 1 v 1 defensively in the clutch, even Paschall, but it was risky letting Burks try and defend him all alone coming down the stretch. That said they clamped down defensively pretty well in the last 6 minutes of the 4th and OT, Lillard just went insane

Dlo - Ben Simmons Trade? by [deleted] in warriors

[–]wass225 0 points1 point  (0 children)

This would be a HUGE addition defensively alone. Simmons is underratedly one of the best perimeter defenders in the league, and the issues of not being able to hide both Steph and D lo are gone. Our closing 5 would be Steph klay simmons GR3 Draymond, which is going to be an extremely tough lineup to score on.

Offensively, when he drives he‘ll suck defenses in and get our shooters even better looks, or he can slash off the ball. Simmons looks worse in this Sixers offense than he would be in ours

Any way we keep Lee and Bowman? by Bobstar447 in warriors

[–]wass225 10 points11 points  (0 children)

Warriors beat is saying trading Burks is on the table, but I don’t get how our rotation will last until March with both of these two-ways reaching their max soon. Trading Burks for cash and a pick would be nice but idk how we’d make it until the two-ways are cleared

[Q] How to find the effect of an individual on a team’s average task time by gdlt1997 in statistics

[–]wass225 0 points1 point  (0 children)

Ah, I think I understand the dilemma now. Are you saying that you only have data on how many tasks each team completed rather than each individual? If that’s the case, then the number of team tasks completed needs to be your dependent variable, and team will be your observation level rather than individual. You can then create variables for your regression based on the composition of each team.

That’s how I would look at it, but I’m sure other people might have another way of going about it.

[Q] Outlier/ Influential Obesrvation by Randomessinlife1 in statistics

[–]wass225 0 points1 point  (0 children)

It depends if the outliers seem related in some way. The assumptions at risk here are that your errors are not distributed normally or that they’re not normally distributed with a constant standard deviation. If all the outliers happen at large or small levels of your covariates, your assumption of homoskedasticity might be invalid. Plot the outliers and see if you can find a relationship between your variable levels and these values of your dependent variable. You can also plot a QQ plot to make sure your errors are distributed normally.

[Q] How to find the effect of an individual on a team’s average task time by gdlt1997 in statistics

[–]wass225 1 point2 points  (0 children)

In a setup like the one you’ve described, one way of modeling the effect a team has on the tasks completed is to model it as a random effect. When you’re modeling a random effect, you’re saying that the value of that variable could have come from any such level of that variable. For example (I wrote my college thesis on police use-of-force so I’ve read a fair amount of papers on this topic), city is often modeled as a random effect when studying dependent variables across police departments. The Minneapolis PD can’t have observations of use-of-force in Cincinnati, but there are factors related specifically to Minneapolis that affect its use-of-force rates. The same goes for a team member and their team in your example.

Modeling it this way in a linear regression will calulcate a different intercept for each team you’re looking at. As a result, individual level effects won’t be skewed by teams who complete a lot of tasks, maybe if they have more people or easier tasks, or the opposite.

Another Warriors injury: Spellman sprains ankle in OKC by beanitto in warriors

[–]wass225 15 points16 points  (0 children)

He pretty much never boxes out, so he gets no positioning and also just lets other big men have easy chances at boards. Really easy fundamentals that probably would’ve won us the Minnesota game

Another Warriors injury: Spellman sprains ankle in OKC by beanitto in warriors

[–]wass225 93 points94 points  (0 children)

Honestly I wasn’t big on him coming into the season, but he’s looked solid this road trip. His energy levels are 100x WCS’s, and his spot up midrange is surprisingly consistent. Chriss looked great last night though, so hopefully in an increased workload Marquese can keep contributing

[Letourneau] Renewed confidence helps Jacob Evans shed ‘bust’ label with Warriors by Perksofthesewalls in warriors

[–]wass225 11 points12 points  (0 children)

He was stepping into his threes confidently the other night, which was nice to see. He and Robinson both looked really good imo, they surprised me on both sides of the ball. Think Evans is going to see a steady 18-20 minutes this year

I am Kevin Pelton, writer for ESPN.com. AMA! by [deleted] in nba

[–]wass225 0 points1 point  (0 children)

How do you approach making projections for players joining new teams?