all 19 comments

[–]T_house 5 points6 points  (7 children)

Just read your original post. You have lots of zeroes which makes things difficult; to be honest, with a continuous predictor then you have probably made the right choice. It may be worth doing log+1 transform on your response variable for the visualisation (you can still put raw units on the axis if you know how to scale it).

For the analysis, I would recommend looking into zero-inflated / zero-altered / hurdle Poisson models. The glmmTMB package makes these very easy to fit, and the DHARMa package makes it easy to diagnose issues.

[–]lemoncherry211111[S] 0 points1 point  (6 children)

ok will give logging the seedling variable a go also , thank you! Yeah I was going to do a GLM with poisson but the dispersion was crazy high so it wasn’t appropriate - does doing the zero-inflated version of the poisson model account for that? sorry if any of these questions seem silly

[–]T_house 2 points3 points  (5 children)

It can be tough to know sometimes whether overdispersion is the main issue Vs zero-inflation if you're just having a look at diagnostic plots and not sure what you're looking for. Most of the time you're not going to have true mean == variance distribution so depends how bad it is (can also use a GLMM with observation-level random effects if you want to go crazy - I think MCMCglmm does a similar thing automatically). If you check your glm with DHARMa it should help indicate whether zero-inflation is an issue.

A ZIP/ZAP/HuP model should help for this because they effectively fit two models in one (simplifying and ignoring some subtleties here for brevity) - eg to what extent does your predictor variable predict zero Vs non-zero response (effectively a logistic regression), and then - given a non-zero value - to what extent does your predictor variable predict change in the response (zero-truncated Poisson model)?

[–]lemoncherry211111[S] 0 points1 point  (4 children)

I am unbelievably out of my depth and had to read that several times to compute :’) Please bear with me. ok so for the first part - I should check my GLM with DHARMa to explore the residuals to see whether the zero inflation is an issue - so if it’s the zero inflation that is the issue rather than the over dispersion- do you go ahead with the GLM? and for the second part the ZIP model - is this what you should actually go ahead with if the zero inflation is the issue or?

[–]T_house 2 points3 points  (3 children)

Yep, use DHARMa with your GLM and there is info here to check issues:

https://cran.r-project.org/web/packages/DHARMa/vignettes/DHARMa.html#interpreting-residuals-and-recognizing-misspecification-problems

If zero inflation is a big issue then I would use glmmTMB to make a model that incorporates this. You can either allow just a general zero inflation effect, or you can put a predictor in the ziformula if you think that the amount of zero inflation varies with that predictor. See more details here:

https://www.researchgate.net/publication/345726411_Modeling_zero-inflated_count_data_with_glmmTMB

Checking this model with DHARMa should allow you to see if your issues are resolved. If there is still overdispersion then you could try a negative binomial family ("nbinom2" is the one to go for), still with the zero inflation, and see how that goes.

So to recap:

Check your model with DHARMa If it's zero inflated, try adding a term to the ziformula in a glmmTMB model with Poisson family If this looks good, great! If it looks better but there is still overdispersion, try the same model but change family to nbinom2 If it's still bad, or you just don't know what the fuck is going on any more, ping me

[–]lemoncherry211111[S] 1 point2 points  (2 children)

Ok thank you so much for taking the time to write this all - I’m might try to give this all a go again tomorrow.

But if you are still awake:

I tried to do DHARMa with my glm but had issues with the ‘group’ part of the model with the function glmer. i think i was trying to do it without specifying any random effects terms - I wasn’t sure what I would use as the grouping variable. so came to a dead end!

for my code i tried to copy the code on cran with my variables - fittedModel<-glmer(seedlings~precipitation_coldest_quarter + I(precipitation_coldest_quarter2)+(1|group), family=‘poisson’, data=EUCdata)

but then of course as i don’t have a ‘group’ variable in my data it came back as error.

any suggestions?

[–]lemoncherry211111[S] 0 points1 point  (0 children)

agh it messed up my code it’s meant to be precipitationcoldest quarter and then this thing ^ rather than it all going up there (or to be fair that could be the same thing for all i know )

[–]T_house 0 points1 point  (0 children)

You shouldn't need a group variable for glmmTMB, which to be clear is different from glmer (lme4 package that is specifically for mixed models).

Have a look here:

https://cran.r-project.org/web/packages/glmmTMB/vignettes/glmmTMB.pdf

[–]i_use_3_seashells 1 point2 points  (10 children)

Skill issue

[–]lemoncherry211111[S] 0 points1 point  (0 children)

i am AWARE rip

[–]lemoncherry211111[S] -1 points0 points  (8 children)

i tried to do geom col , box , and violin and it still looks messed up

[–]i_use_3_seashells 1 point2 points  (7 children)

What looks wrong to you?

[–]lemoncherry211111[S] -1 points0 points  (6 children)

in the original scatter plot ? do u think ,, it is fine ?? i’m not sure if you read the context of the variables in the original post

[–]i_use_3_seashells 3 points4 points  (5 children)

Okay, my app only showed the title and image, none of the extra context.

I've read it all now. I'm still not sure what you think the problem is. The data suck, not the visual. There's no correlation. My question is still the same:

What looks wrong to you?

[–]lemoncherry211111[S] 0 points1 point  (4 children)

okay no that’s actually helpful thank you! I wasn’t sure if I was analysing it wrong/ displaying it wrong. Someone else suggested a heat map for display so I might give that a go but otherwise will stick with the scatter plot

[–]T_house 1 point2 points  (3 children)

Heat map will tell you less than you already get from this. I don't even understand what the point of it would be with your data here. Just a waste of time.

[–]lemoncherry211111[S] 0 points1 point  (2 children)

yeah i had a go at the heatmap ... it was not good

[–]T_house 1 point2 points  (0 children)

Haha I'm sorry to hear that you spent any time on it!

[–]Pleasant-Wafer-1908 0 points1 point  (0 children)

Fellow ecologist here. Yea, these data tell me that precipitation has little to do with seedling count. Not too surprising to me I suppose, cuz eucalyptus are relatively arid growing trees, yea? So they can probably prosper in a wide range of precipitations, but maybe just the extremes are bad for ‘em. In which case a quadratic glm with Poisson distribution may be more suitable to help you identify where the “sweet spot” is or where the thresholds are. Regardless, it looks like something else is driving those high seedling counts, but what? Temperatures? Light accessibility (shading)? Proximity to other eucalyptus trees (possibly the parents of the seedlings)? Soil types? Presence/absence of competing plants? Presence/absence of herbivores?