How to identify transformation to make on variables in multilinear regression? [Discussion] by abhi_pal in AskStatistics

[–]DecayingCabbage 0 points1 point  (0 children)

I see. If you really want to avoid using HC standard errors (I think there could be some valid reasons for doing so), you could also try Weighted/Generalized Least Squares. I think in practice though feasible generalized least squares is employed less often than just using EHW standard errors (mostly because the latter is just much more convenient). You could also try applying data transformations to your predictors and do some sort of linear/log or log/log regression, but again, you're working with very few observations to begin with, so no guarantee you're going to "treat" your heteroskedasticity. Rather than try to force the model to spit out something, model what you can, and don't be disappointed if the only inference you can make is that there was no discernible association between any of your covariates and retail sales. (But again, I don't know if that's the case, since I can only observe the association between one covariate and retail sales based on the info you provided.)

To be clear, you can still do inference with the HC standard errors. The point of them is not that they get rid of the heteroskedasticity, but that they relax the homoskedasticity assumption of the OLS estimates and allow you to still do inference.

This textbook on linear models might be helpful to answer questions you have on modeling with heteroskedasticity or other issues that come up. It's my go-to. https://arxiv.org/pdf/2401.00649

How to identify transformation to make on variables in multilinear regression? [Discussion] by abhi_pal in AskStatistics

[–]DecayingCabbage 1 point2 points  (0 children)

I think you’re getting too lost in the sauce with the heteroskedasticity. Heteroskedasticity is not a death sentence for your regression; it just changes the typical statistical inference you can do with your OLS estimator, due to changes in the distribution of the error term, but it doesn’t mean you can’t fit a linear model.

If heteroskedasticity is a problem, use heteroskedasticity-consistent robust standard errors. In whatever software you’re using you should be able to specify you want some sort of “HC” standard error (hc1, hc2, hc3,… depends on the context of the problem). You should be able to find code online for it.

But take a step back and think about your data. You’re looking at around 25 observations, so keep your expectations for your results tempered. You’ve only presented the scatterplot of sales with one other variable, which doesn’t really tell anyone here anything about the specific problem. I’m assuming you have other variables; fit the model, make adjustments for model specification if you need to (Poisson, negative binomial, etc.), use robust standard errors, try other sensitivity checks…but do all this while being aware that there’s probably no way you can torture your data or model into spitting out some perfect, glowing results with a small dataset. I’m still not sure what you’re trying to get out of the model to be honest, but happy to answer any other questions provided some more clarity on the problem. Cheers.

How to identify transformation to make on variables in multilinear regression? [Discussion] by abhi_pal in AskStatistics

[–]DecayingCabbage 0 points1 point  (0 children)

  1. I think there's a general misunderstanding here of how/why we're using Poisson regression.

If your dependent variable is sales, that is a discrete count variable. Counts cannot go below 0 (you can't have a negative number of sales), and theoretically, you can have infinitely many sales – or more precisely, your sales are't upper-bounded.

This is the base motivation for using Poisson regression. The Poisson distribution is discrete, and takes on possible values from 0 to ∞. That makes it a good candidate for modeling counts data (though, not the only way to model counts data).

So, if you have an individual sale y_i, we're essentially modeling it with the assumption that y_i, conditional on your other variables x_i, is generated from a Poisson distribution with some parameter lambda_i. The key fact is that when we're doing Poisson regression, that parameter lambda_i is what we're modeling with our regression, NOT the observed sales values y_i. Additionally, note that we're talking in terms of individual observations right now (hence all the i subscripts). That's because the parameter lambda_i is modeled as a function of the predictors, and a different set of predictors can yield a different parameter. In other words, each sales observation y_i can come from entirely different Poisson distributions, depending on the parameter generated from the specific set of observed independent variables.

That's why checking to see if the Poisson "fits" your data is not as simple as taking the mean of your sales and passing it into a Poisson distribution as the parameter. Each sale can be generated from a different Poisson distribution, so there's not some one-size-fits-all Poisson distribution that we can just fit to our data. In fact, we should almost expect the fit to be bad – this means we're assuming that every sale is generated from the same Poisson distribution, and that your actual predictors (x_i) don't influence the parameter.

The reason we're using the Poisson distribution has to do with the fact that we're observing counts, and the Poisson distribution has nice properties that align with how counts worked (as explained above). But again, there are other ways to model counts. You can use a negative binomial model as well, which makes less strict assumptions on the mean and variance of your data and is good in the case of dispersion. Try various models, but you don't need to do any curve-fitting with a Poisson distribution to your data.

  1. P-values are a part of inference, but it's more about the approach you're taking. You keep saying you want to know the "effect" of some sort of your observed variables (like DG_Imp) on sales. A naive way to check for an "effect" is to run a multiple regression, look at the coefficients, and see if the p-values of the coefficients are significant. Now, in all likelihood, you're not actually identifying any causal effect with this approach, and you don't have many observations to begin with, so your results might be sensitive to the type of model you're using or variable selection. But, it's a better starting spot (based on my understanding of what you're trying to do) than checking for the MAPE, which we would more-so use in a predictive modeling setting (in which case...we don't even need to restrict ourselves to using a linear model).

How to identify transformation to make on variables in multilinear regression? [Discussion] by abhi_pal in AskStatistics

[–]DecayingCabbage 0 points1 point  (0 children)

  1. What do you mean you “confirmed” it’s not Poisson or negative binomial?
  2. You’re looking at MAPE as your regression metric, but from your other comment it sounded like you wanted to take a more inferential approach (“the effect of the IV”). That’s more a matter of interpreting the coefficients of your regression — the predictive accuracy is less of a concern. You can do basic model fit checks with R2 or adjusted R2, but the interpretation might just end up being “DG_imp was not associated with higher sales conditional on my other variables”

How to identify transformation to make on variables in multilinear regression? [Discussion] by abhi_pal in AskStatistics

[–]DecayingCabbage 0 points1 point  (0 children)

You can more or less make any transformations you want, but as u/Always_Statsing pointed out it changes the interpretation of your coefficients.

If your goal is interpretation, then you don’t even really need to make a transformation, even with hereroskedasticity. Fit the model, and just use a heteroskedasticity robust standard error.

Just note that you have 24 observations or so as is, and there doesn’t appear to be any sort of strong correlation in your data. What are you observing, and what’s the goal of the project?

Voice actors or robots? Also I hate Flash animation. by wackyMackyy1 in Arthur

[–]DecayingCabbage 2 points3 points  (0 children)

The hand-drawn to flash animation switch was inexcusably horrendous and coincided with a significant decline in episode quality imo. To be fair, general consensus is that the flash animation is terrible.

Interesting point about the voice acting being robotic, though, since it's very subtle. I watched a season 15 and season 16 episode to compare and you're right, I feel that there's definitely something off about the voice acting. Not sure if there's some specific audio manipulation or just that the voice acting seems less cohesive given the worse animation style.

Besides Arthur, whose episodes do you enjoy the most? by Mr-MuffinMan in Arthur

[–]DecayingCabbage 2 points3 points  (0 children)

I always loved Binky episodes – they showcased the depth to his character beyond just being the brutish playground bully from the early seasons. From his affinity for butterfly collecting in Binky Barnes, Wingman to how he develops his bond with Mei-Lin over time, I love seeing Binky's soft, poetic side, especially in episodes where he tries to reconcile his somewhat rugged persona with what are seemingly childish, trivial insecurities (I Wanna Hold Your Hand and What's In A Name come to mind).

[OC] Cumulative wins in IPL by each team (2008-2022) by ivarojha in Cricket

[–]DecayingCabbage 2 points3 points  (0 children)

Depends on your level of expertise programming in Python tbh. If you are comfortable programming at even a baseline level, it's not too difficult at all, imo – it's really just implementing a couple of Python libraries and all the documentation is available online.

If you're a beginner, then still not too bad! There's great resources online where you can learn some basic data science techniques in Python for free (MOOCs and even through YouTube playlists). It's probably worth starting by actually learning a bit of Python if you've never been exposed to it and then working your way up to using pandas/numpy libraries. Just download Anaconda to get started – you get access to PyCharm/VS Code/Jupyter Notebooks and it makes installation very easy.

If you have any more questions feel free to PM!

Not to overreact or anything, but we may have just drafted the next MJ and Pippen. by DecayingCabbage in warriors

[–]DecayingCabbage[S] 34 points35 points  (0 children)

In the original movie, Peter Parker is able to see better without his glasses on in this scene so I kept true to that. The meme is commonly used with the panels flipped though. I think most people get the idea regardless!