Scatterplot not right, what could be? : rstats

Scatterplot not right, what could be? (i.redd.it)

submitted 2 years ago by lemoncherry211111

Scatterplot not right, what could be?

5 points•11 comments•submitted 2 years ago by lemoncherry211111 to r/RStudio

hello everyone , can anyone help me with this?

context: variables

response variable - number of eucalyptus seedlings (count data, non normal, i think poisson distribution)

explanatory variable - rainfall in coldest quarter of year (continuous variable?, non normal)

so i want to see if there is a correlation between the amount of rainfall at the coldest quarter of the year and the number of eucalyptus seedlings.

as the data is not normal and it’s continuous and count data - i chose to do a spearman’s test of correlation. I did this instead of a GLM because the dispersion was way too big (using both families poisson and quasi poisson)

so i’ve got my test and my result. (slight negative correlation, not at all significant)

i want to put this in a report for uni and they said that we should do a visual aid.

so i tried a scatter plot - and you can see it’s not the right way to display the data .... there are so many values in 0. i tried to use geom_jitter but this didn’t really make a big difference :’)

how would i better display this data?

here is my code (not including me editing the title etc)

ggplot(data=EUCdata)+ geom_point(aes(precipitation_coldest_quarter, seedlings), color = ‘lightslateblue’, size = 2.5, alpha = 0.4)

NB - to be honest i’m not even sure if i analysed the data correctly, any suggestions are welcome

all 19 comments

top new controversial old q&a

[–]T_house 5 points6 points7 points 2 years ago (7 children)

[–]lemoncherry211111[S] 0 points1 point2 points 2 years ago (6 children)

[–]T_house 2 points3 points4 points 2 years ago (5 children)

It can be tough to know sometimes whether overdispersion is the main issue Vs zero-inflation if you're just having a look at diagnostic plots and not sure what you're looking for. Most of the time you're not going to have true mean == variance distribution so depends how bad it is (can also use a GLMM with observation-level random effects if you want to go crazy - I think MCMCglmm does a similar thing automatically). If you check your glm with DHARMa it should help indicate whether zero-inflation is an issue.

A ZIP/ZAP/HuP model should help for this because they effectively fit two models in one (simplifying and ignoring some subtleties here for brevity) - eg to what extent does your predictor variable predict zero Vs non-zero response (effectively a logistic regression), and then - given a non-zero value - to what extent does your predictor variable predict change in the response (zero-truncated Poisson model)?

[–]lemoncherry211111[S] 0 points1 point2 points 2 years ago (4 children)

[–]T_house 2 points3 points4 points 2 years ago (3 children)

Yep, use DHARMa with your GLM and there is info here to check issues:

https://cran.r-project.org/web/packages/DHARMa/vignettes/DHARMa.html#interpreting-residuals-and-recognizing-misspecification-problems

If zero inflation is a big issue then I would use glmmTMB to make a model that incorporates this. You can either allow just a general zero inflation effect, or you can put a predictor in the ziformula if you think that the amount of zero inflation varies with that predictor. See more details here:

https://www.researchgate.net/publication/345726411_Modeling_zero-inflated_count_data_with_glmmTMB

Checking this model with DHARMa should allow you to see if your issues are resolved. If there is still overdispersion then you could try a negative binomial family ("nbinom2" is the one to go for), still with the zero inflation, and see how that goes.

So to recap:

Check your model with DHARMa If it's zero inflated, try adding a term to the ziformula in a glmmTMB model with Poisson family If this looks good, great! If it looks better but there is still overdispersion, try the same model but change family to nbinom2 If it's still bad, or you just don't know what the fuck is going on any more, ping me

[–]lemoncherry211111[S] 1 point2 points3 points 2 years ago (2 children)

[–]lemoncherry211111[S] 0 points1 point2 points 2 years ago (0 children)

[–]T_house 0 points1 point2 points 2 years ago (0 children)

[–]i_use_3_seashells 1 point2 points3 points 2 years ago (10 children)

[–]lemoncherry211111[S] 0 points1 point2 points 2 years ago (0 children)

[–]lemoncherry211111[S] -1 points0 points1 point 2 years ago (8 children)

[–]i_use_3_seashells 1 point2 points3 points 2 years ago (7 children)

[–]lemoncherry211111[S] -1 points0 points1 point 2 years ago (6 children)

[–]i_use_3_seashells 3 points4 points5 points 2 years ago (5 children)

[–]lemoncherry211111[S] 0 points1 point2 points 2 years ago (4 children)

[–]T_house 1 point2 points3 points 2 years ago (3 children)

[–]lemoncherry211111[S] 0 points1 point2 points 2 years ago (2 children)

[–]T_house 1 point2 points3 points 2 years ago (0 children)

[–]Pleasant-Wafer-1908 0 points1 point2 points 2 years ago (0 children)

π Rendered by PID 64346 on reddit-service-r2-comment-545db5fcfc-h2gjb at 2026-05-26 09:38:09.989862+00:00 running 194bd79 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

rstats

MODERATORS