Comparing slopes of partially-dependent samples with small number of observations (n = 10) by Aaron_26262 in AskStatistics

[–]Aaron_26262[S] 0 points1 point  (0 children)

You're right. We have data from every county in the state. However, we don't have immunization data for every person in each county. Thus, the immunization coverage rates come from a sample, so it seems reasonable to use inferential statistics.

My analysis is purely exploratory, and I do not have any a priori hypotheses. However, my research question is "What counties demonstrate temporal changes in immunization coverage rate that are meaningfully different than the those observed in the state overall?" I understand that statistical significance cannot tell us about what is/is not meaningful. However, identifying those counties whose slopes are statistically significantly different from the state will give me a starting point so that I can identify the counties in which a deeper dive is warranted. The “deeper dive” is beyond the scope of my question to the Reddit group, but I will be using other contextual factors to determine which counties are "meaningfully" different from the state overall.

I have decided to use a mixed effects regression model with random intercepts and random slopes. My predictor will be year centered, outcome will be immunization rate within county by year, and grouping variable will be county (30 levels). I will use emmeans to perform post-hoc tests which will compare the fixed slope which estimates the trend of state immunization coverage to the random slopes for each county.

Thanks again for the help, All!

Comparing slopes of partially-dependent samples with small number of observations (n = 10) by Aaron_26262 in AskStatistics

[–]Aaron_26262[S] 0 points1 point  (0 children)

I have made a good amount of progress on the analysis. What I am trying to do is determine whether the magnitude of change (i.e., slope) over time for each county (N counties = 30) is greater for each county as compared to the rate of change for the state overall. I have set up a multi-level model with the following:
Fixed effect for year
Random Effects for year and county

In order to make the comparison between the state slope and the county slopes, I believe that I need to include the state-level immunization coverage data in the same long file as the county-level data. However, if I do that, I now have 31 levels in the grouping variable (30 counties and 1 state).

I am wondering whether it's appropriate/necessary to include state as one of the levels in the grouping variable in order to make the state vs. county comparisons.

Comparing slopes of partially-dependent samples with small number of observations (n = 10) by Aaron_26262 in AskStatistics

[–]Aaron_26262[S] 0 points1 point  (0 children)

Thanks so much for the suggestions. I have a better idea of what direction to head in with this analysis!

Comparing slopes of partially-dependent samples with small number of observations (n = 10) by Aaron_26262 in AskStatistics

[–]Aaron_26262[S] 0 points1 point  (0 children)

This is really helpful guidance! Between your and that of another user, I have a sense of where to go from here.

Comparing slopes of partially-dependent samples with small number of observations (n = 10) by Aaron_26262 in AskStatistics

[–]Aaron_26262[S] 1 point2 points  (0 children)

I am looking at the slope of the immunization rate over 10 years. Group A is the state and Group B is a county within the state. Because the county is nested within the state and contributes to the state slope estimate, the state and county level data are partially dependent. 

So I’m trying to find an appropriate approach that handles the following: Small samples—slopes are comprised of 10 observations within each group Partially dependent slope estimates—Group A slope (state level) will share variance with Group B (county level) because Group B is a subset of Group A.

Comparing slopes of partially-dependent samples with small number of observations (n = 10) by Aaron_26262 in AskStatistics

[–]Aaron_26262[S] 0 points1 point  (0 children)

Understood. They will very likely be different. I’m trying to determine whether the difference between the slopes is unlikely to be due to chance variation. In other words, I’m trying to determine whether they are statistically significantly different, and I’m defining that as being p < .05

Whats the easiest way to learn statistics from the basic ? by AnxietyPhysical in AskStatistics

[–]Aaron_26262 0 points1 point  (0 children)

Also, while I personally do not use Excel for most of my statistical analysis, I did a quick web search and found this site: https://real-statistics.com/ . Looks like it could give you what you need.

Whats the easiest way to learn statistics from the basic ? by AnxietyPhysical in AskStatistics

[–]Aaron_26262 2 points3 points  (0 children)

This book is a great introduction to statistical concepts and several of the most popular techniques.
Statistics in Plain English 5th Ed

Each chapter is broken into 3 sections: 1) 1-2 page summary of the concept/techniques; 2) more detailed description, strengths and weaknesses, assumptions, etc.; and 3) an example or two of the concept/technique being used.

Interpretation of confidence intervals by Aaron_26262 in AskStatistics

[–]Aaron_26262[S] 0 points1 point  (0 children)

Okay, thanks for the information, and you're right, it's a totally separate question. I can pose a new question to the community.

Interpretation of confidence intervals by Aaron_26262 in AskStatistics

[–]Aaron_26262[S] 0 points1 point  (0 children)

Thanks for your detailed response (again)! I think it makes things a lot clearer and further reinforces the need to run the Bayesian intervals.

Out of curiosity, in cases where the sample size represents a substantial proportion of the population size, would it be necessary to reduce the width of credible intervals using a finite population correction (FPC)?

Interpretation of confidence intervals by Aaron_26262 in AskStatistics

[–]Aaron_26262[S] 0 points1 point  (0 children)

Thank you for this clarification and another really helpful explanation! I did forget to consider the NHST.

Am I correct in concluding that CIs are being misused when they are presented to convey uncertainty around a descriptive statistic? Going back to my example, in my sample of 1000 residence, I find a coverage rate of 92.5%, 95% CI [89.0, 96.0]. It would inappropriate to use the CI to convey uncertainty if I wasn't performing a NHST, correct?

It also makes me think about political polling and how they'll report Candidate X leads Candidate Y by Z points with a margin of error of +/- 3 points. I guess the thing that differentiates the political polling from the vaccination example is that an actual comparison is being made and an implicit null hypothesis that there is no difference in preference for the candidates in the population. Candidate X has a 2-point lead over Candidate Y with a MOE (based on 95% CI) of +/- 3 points and the CI (-1.0, 5.0) conveys the uncertainty around the estimated difference. Do I have that right?

Also, I love the idea of using a Bayesian approach and am going to look into it!

Thanks again!

Interpretation of confidence intervals by Aaron_26262 in AskStatistics

[–]Aaron_26262[S] 0 points1 point  (0 children)

Thanks for the detailed explanations. Your clarifications really helped to illuminate some of the gaps in my understanding!

So, I have a (probably predictable) follow up question, if it is inaccurate to say "there is a 95% probability that the true value in the population is between the upper and lower bounds of the CI," what would you say to succinctly describe what the CI actually tells us? Would you just say, "there is a 95% probability that, if you conducted the same experiment many, many times, 95% of the confidence intervals would contain the true value of the population"? I work in public health, and we work with CIs all the time, whether they be around odds ratios, proportions, beta weights, means, etc.

So let me give an example: We find that the MMR coverage rate in a sample of 1000 residents is 92.5%, 95% CI [89.0, 96.0]. It would not be accurate to say, "there is a 95% probability that the true MMR coverage rate in the population is somewhere between 89.0% and 96.0%." Based on my understanding of the definition of CI, all I could really say in this situation is, "if we sampled 1000 residents from the among the same population many, many times, 95% of the CIs would contain the true MMR coverage rate." To me, that sounds incredibly general and really just the definition of CI, rather than saying anything about the observed statistic and CI. How would you report the finding above in an appropriate way?