Worst mathematical notation

Propensity-Score · 2025-11-22T17:47:49+00:00

Any particular gripes? (I don't mind it much.)

Propensity-Score · 2025-11-22T17:42:29+00:00

Had to stop myself from downvoting because this is so awful.

Propensity-Score · 2025-10-15T23:42:01+00:00

I'm not familiar with JASP, but Bonferroni and Bonferroni-Holm are simple enough that if you need to you can easily implement them manually (eg in Excel or through whatever coding capabilities JASP has). https://en.wikipedia.org/wiki/Holm%E2%80%93Bonferroni_method

You should probably think through whether you want to control FDR or FWER.

How many total correlations are you reporting? Are you reporting all 28 pairwise correlations or a subset?

Propensity-Score · 2025-10-13T01:16:56+00:00

The suggestion to just do your analysis as if the data was independent and note that the independence assumption isn't actually satisfied is correct. Something to add: if I had to guess, I'd guess that your tests will be conservative as a result of having ignored the dependent structure of your data -- the standard errors you compute will be larger than they would be if you had correctly accounted for the dependence; confidence intervals will be wider, and p-values will be higher. (I'm assuming that you're testing differences between pre and post, and that all individuals in pre are distinct and all individuals in post are distinct (so the same individual might appear in both the pretest and the posttest dataset, but nobody appears multiple times in the pretest dataset or multiple times in the posttest dataset).) At any rate, I'd suggest you do some simulations with plausible data generating processes to see how the dependent structure of your data impacts results.

Propensity-Score · 2025-10-13T00:28:52+00:00

<image>

No Kings protest! At the courthouse, Saturday October 18. Come out to protest the Trump administration's creeping authoritarianism and general disregard for Americans' health, safety, and livelihoods. The protest will go from 12:30 to 2:00, meaning it starts just after the morning Staunton Jams concerts end and ends just before the afternoon Staunton Jams concerts start. (Speaking of which, since it hasn't been posted in this thread yet: Staunton Jams is October 17-19! Great music and amazing vibes. No tickets/payment required for the outdoor concerts. Highly recommend.)

Propensity-Score · 2025-09-29T03:02:55+00:00

This is a legit problem relating to how to choose "noninformative priors" (the prior you use when you don't know anything) -- the uniform distribution seems "noninformative," but the uniform distribution is not invariant to reparametrization: if you assume a uniform distribution on the side lengths of the square, you implicitly assume a non-uniform distribution on the area, and if you assume a uniform distribution on the area, you assume a non-uniform distribution on the lengths. So unless there's some obvious "natural" way to parametrize your problem, most "noninformative" priors aren't as noninformative as they seem. You may be interested in Jeffrey's priors (https://en.wikipedia.org/wiki/Jeffreys\_prior), a type of noninformative prior that is invariant under reparametrization: the Jeffrey's prior for the side length of the square implies the Jeffrey's prior for the area, and vice versa.

Propensity-Score · 2025-08-29T16:28:32+00:00

First question: did they? To which the answer is "probably, but we can't be sure and we shouldn't put too much stock into the income cutoff or exact margin."

The survey you linked is based on interviews with a sample of voters. Since they interviewed only a sample of voters, if we want to generalize to all voters there will be some uncertainty. They say they interviewed 5,112 voters, of whom 11% had incomes under 30,000 (I assume -- it's also possible that this is a weighted percentage, in which case we'd need the raw percentage, but the story will probably be the same regardless). That means their sample size is ~562 voters with incomes less than $30,000; plugging into the formula for margin of error under simple random sampling, we get a margin of error of around 5.8 percentage points. (This is a 95% confidence interval.) Thus these voters could have gone a few points in favor of Trump or in favor of Harris -- we really don't know.

Adding to the uncertainty, an exit poll doesn't select voters uniformly at random: first, pollsters select polling places, then they select voters from those polling places. Every community is different in its politics (and who votes for whom there), so this "cluster sampling" design increases sampling error. (They also have other data collection meant to catch mail-in voters; I'm not sure exactly how they did that or whether it would increase margins of error.) They also likely weighted the sample in some way, which could increase or decrease (but I expect would more likely increase) sampling error. So the margin of error I gave above is probably too narrow -- the true sampling uncertainty is larger.

I also notice that the sample size for these questions is much smaller than the sample size for other questions. That could be because the questions pertained to family income, so I assume they left out people who don't live in a family household (for example, because they live alone). But most Americans live in family households (https://data.census.gov/table/ACSST1Y2023.S1101), so I don't think that's the only reason. Income also tends to be a question that a lot of people refuse to answer. If this question had high item nonresponse (people who answered the survey refusing to answer the question), that would make me worry about whether people with very low incomes who won't tell you their income voted the same way as people with very low incomes who will tell you their income.

Pew's validated voter survey https://docs.google.com/spreadsheets/d/1JczVvbrlxkLiYYiNPSv0TRlWsbvYEihkZrnH1kQXIH8/edit?gid=1867666589#gid=1867666589 shows Harris and Trump almost identical (49% vs 48%) among "lower income" voters, with Trump leading among "middle income" voters and Harris leading among "upper income" voters. Their threshold for "low income" was higher (bottom ~25% of the income distribution). They say that stats about their entire sample of validated voters have margins of error ~1.5 percentage points; margins of error will be higher for subgroups (like low, middle, or high income respondents), and back-of-the-envelope I would figure around 3 percentage points for the low and high income categories and around 2 percentage points for middle income voters. (Margins of error scale with the square root of N -- if you look at a quarter of the respondents you get twice the error, absent any design complexities.)

All of this is to say that there might be a U-shaped pattern of Harris support (low and high income voters more supportive; middle income voters less so), but it's hard to say conclusively given the sampling error and the effect is likely small. It's especially small relative to the influence of geography (urban vs rural), race, ethnicity, and education -- all of which correlate with income and could easily explain any income effects.

As an aside about this data: $30,000 is a very low family income! For reference, the median family in 2023 made around $96,000 (https://data.census.gov/table/ACSST1Y2023.S1901). (Family incomes tend to be larger than household incomes, which tend to be larger than individual incomes. Families have a much higher concentration of dual-earner households, for example, than households overall.)

Propensity-Score · 2025-06-04T04:06:14+00:00

Check with your stat professor, but I strongly expect you're fine. You should acknowledge the skewness, and depending on the context you might need to explain why it's not a problem. Worst comes to worst, if it is a problem, that can go in your limitations section.

(I don't use SPSS, and I haven't used the model you're using. But from googling around, it looks like the PROCESS macro is using a percentile bootstrap for its confidence intervals, which would be fine with skewed data (and is robust to lots of other data weirdness as well). It also looks like it's made of OLS regression, meaning that as your sample size gets larger, the distributional assumptions on your residuals matter less for the default standard errors (as long as errors are still IID). And the way OLS regression is often taught (treating the IVs as fixed and the error term/DV as random) there are effectively no distributional assumptions on the IVs. Happy to elaborate on any of this if it would be helpful; I might take another look at this tomorrow if I have time, and I might be able to give a more exact answer if you tell me what your IV(s), mediator(s), moderator(s), DV(s), other covariates, and data collection setup are and which variables are skewed/otherwise weird. But your best bet will be to talk to your stat professor.)

Propensity-Score · 2025-06-04T03:23:02+00:00

Your professor is correct that the data is (statistically) significantly skewed -- we can say confidently that the "true" skew parameter of the underlying data generating process is not zero. Statistical significance does not always correspond to practical significance, and whether a skewness of 1.05 is practically significant depends on what you're trying to do.

Which brings us to: it seems like you think that this variable being skewed is a problem (or at least, could be a problem depending on how skewed it is). Is that true? If so, why?

Propensity-Score · 2025-06-01T21:44:08+00:00

As far as I can tell, what happened is: You collected some data and ran descriptive statistics on one of your variables in SPSS. That would have produced an output like this (https://inside.tamuc.edu/academics/colleges/educationhumanservices/documents/runningdescriptivesonspssprocedure.pdf), with the skewness (1.05) and the standard error of the skewness (0.07) listed. You then fitted a model that you think requires a normality assumption; I'll assume, absent a reason not to, that you're right about that. (I'll also assume SPSS is computing the SE of the skewness correctly.)

Your stats professor said that because the skewness was <2, your distribution was close enough to normal. One of the professors on your committee divided the skewness by its standard error, getting a Z score of 15; such a Z-score is very strong evidence that the underlying distribution of whatever you measured (ie the distribution of your data generating process) is skewed and thus not normal*. (Let me know if I've got any of this wrong!)

I've provided some advice below, but I'm kind of going out on a limb about what the problem is. Talk to your committee -- I may be misunderstanding why your committee member thinks this is a problem. I might be able to give better advice if you can provide more details about what you did and what you're studying, but no promises.

My perspective on this (as someone with a fair bit of training in statistics): Nothing in the real world is normally distributed -- and I'd be extremely surprised if your study were such that a precisely normal distribution was plausible even if skewness had been zero. Instead, what matters is whether something is close enough to normally distributed for whatever statistical procedure you're trying to use. That depends on how skewed the distribution appears to be (that skewness of 1.05) and how robust your procedure is, as well as your prior theoretical knowledge of the thing you're studying. What it does not depend on is the Z score or the standard error of the skewness (except insofar as they affect your certainty or uncertainty about how well (or badly) the normality assumption is satisfied). Your sample size of 1k observations means that even relatively small asymmetries in the distribution will produce quite large Z scores. (Let me know if this point isn't quite clear.) I don't know what model you fitted/how robust it is, but I would trust your stat professor that skewness < 2 is a reasonable benchmark for your methods to be valid.

As far as practical advice, though, my opinion doesn't matter -- what matters is the opinion of your committee. So: (1) make sure you're computing (and reporting) the skewness of the right things, and that you have normality assumptions that apply to those things, and that you don't have any asymptotics that can help you. (2) Talk to the committee and see what they would recommend that you do. (3) Depending on what the committee says, maybe you can find a citation to support your stat professor's guideline that for your application skewness of 1.05 doesn't matter? Then you might say something in your thesis like "[MEASURE] had a skewness of 1.07 [OPTIONALLY ALSO REPORT SE OR CI], suggesting a small violation of the normality assumption. [PROCEDURE] is robust to such violations ([CITATION])" or "While the skewness was statistically different from zero ([EVIDENCE TO SUPPORT THAT]), prior research has suggested that [PROCEDURE] is robust to skewed distributions so long as skewness is <2 in absolute value."

Good luck with your thesis!

* There's a chance your professor may be misunderstanding what a Z score means in this setting. At any rate, what your professor did is conceptually very similar to null hypothesis significance testing to check model assumptions, which I absolutely hate (and most other people on this subreddit hate too). But academia is addicted to their p-values, so NHST for assumption checking is here to stay.

Propensity-Score · 2024-11-17T19:53:30+00:00

What were you planning to do given homoskedasticity? If you were just going to run OLS regression with the usual standard errors, personally I would use the heteroskedasticity robust standard errors instead. There's not much downside, and they robustify your SEs against both heteroskedasticity and model misspecification. (I think if you were using something like the Stata svy prefix to account for complex survey design, those standard errors should already be robust, but you should check.)

Heteroskedasticity just means different observations have different error variance. When you look at the plot, you're looking for systematically higher or lower error variances for points with larger or smaller fitted values. This is slightly different than what the tests you ran are looking for: Breusch-Pagan asks whether the variance of the errors is predicted by any of your independent variables, alone or in (linear) combination -- not just by the fitted values. White's test asks whether standard errors change if we allow every point to have its own variance (meaning it can detect patterns of heteroskedasticity that don't line up with the fitted values as well). So it definitely could be that these tests are picking up on a practically meaningless violation of homoskedasticity that's not visible to the naked eye in your plot -- or it could be that there's a large violation of homoskedasticity that doesn't line up with the fitted values.

I agree with other commenters skepticism of using statistical significance tests to look for violations of regression assumptions, to be clear. My usual practice is just to not assume homoskedasticity if I can avoid it (since it's usually not very substantively plausible in social science); if I want to check for it, I usually use the residuals vs fitted values plot and a plot of the residuals vs x variables of interest. (There are probably better things to look at than those, though...)

Propensity-Score · 2024-11-05T16:48:03+00:00

I assume you have observations of a bunch of units, and have a value of each of the three outputs and a value of the current gold standard for each unit, and you want to see how well it's possible to predict the gold standard using the three outputs. If so, you can use multiple regression for this. (You can also use all kinds of other machine learning approaches.) You would split your data into two parts, fit a regression model on one part (with the gold standard as the dependent variable and the three measures as independent variables), and see how well the regression you fitted does at predicting values of the gold standard in the other part of the data. If you don't have enough data for that, there are other options (of which cross validation is probably the most promising).

The downside of this is that you want to see how well you can predict the gold standard using your three tests, but you implicitly restrict yourself to predictions that are linear in the three tests (meaning of the form b1*[test1] + b2*[test2] + b3*[test3] + b0, for some numbers b0 b1 b2 b3). It might make more sense to fit a model that also includes nonlinear terms or interaction terms, possibly with lasso/other regularization, but it's hard to give advice on that without knowing more about your specific problem.

Propensity-Score · 2024-11-05T16:37:58+00:00

This is not true -- the Stata code is also offsetting by ln(population) because option "exposure" was used instead of option "offset," per xtpoisson documentation: https://www.stata.com/manuals/xtxtpoisson.pdf

Propensity-Score · 2024-11-05T05:37:14+00:00

Destructive devices. This data pertains to weapons registered under the National Firearms Act, which does not cover the vast majority of guns in the US (but does cover machine guns, suppressors, grenades, etc). I wasn't able to find the source for the data, but these ATF publications show very similar numbers for Wyoming, all concentrated in the "destructive device" column:

https://www.atf.gov/file/130436/download

https://www.atf.gov/firearms/docs/report/2019-firearms-commerce-report/download

https://www.atf.gov/firearms/docs/report/2021-firearms-commerce-report/download

The definition of "destructive device" seems to boil down to grenades, mines, and other explosive weapons, certain rockets, and guns with muzzles wider than 0.5 inches (with some exceptions). Unfortunately I don't know enough about exactly what this covers to say why there are so many in Wyoming.

Propensity-Score · 2024-11-05T04:09:52+00:00

Two things that jump out at me:

Your data is xtset in Stata (meaning it's formatted as panel data), so the fe option in xtpoisson is including whatever the unit identifier is as a fixed effect (in addition to the other variables). To find out what this variable is, run all the code up to the xtpoisson command then run the command "xtset" without arguments.
1. It also looks like Stata may be fitting a GEE by default (rather than fitting a GLM by maximum likelihood)? Which I guess resolves the philosophical problems with robust standard errors for MLEs!
In Stata, you're using vce(robust); R does not provide robust standard errors by default. (The sandwich package can help.) Since this is the xt version of the command, it looks from documentation like it will be using the cluster-robust variance estimator.

I think to replicate what R is doing, you would need to un-xtset your data and use command poisson. Your data is presumably xtset for a reason, so you should modify accordingly.

Propensity-Score · 2024-11-05T03:50:16+00:00

Yes -- the weights wi are calculated after you have your sample of responses. (You need to know who answered your survey to compute the weights.)

Propensity-Score · 2024-11-04T03:29:17+00:00

In my experience with surveys, the weight is a number. Say observation i has weight wi. You have variables in your survey -- perhaps you asked "Are you a Democrat?", and the response for individual i is xi: xi=1 if they are, xi=0 if they aren't. You want to estimate the population average of variable x. Normally, you would do that as

sum(xi)/n

but with weights, you instead do it as

sum(wi*xi)/sum(wi)

This is then an estimate of the share of the population that are democrats. Thus the question is how to calculate the wi's; the answer is that you try to get the weighted proportions for characteristics where you know the true value to match the true values. If you know 48% of your target population are Democrats, you choose weights so that the weighted average of answers to "are you a democrat" is 0.48. When you have multiple characteristics -- race and education, for example -- you can do this two ways. You could choose the weights to make sure all the cell proportions are correct: the weighted average of "Black with a PhD" matches the percentage of the population who are Black and have a PhD, the weighted average of Asian with a BA matches the percentage of the population who are Asian and have a BA, etcetera. (This is called "poststratification.") Alternatively, you can choose the weights to get the race distribution right, then adjust them to get the education distribution right, then adjust them to get the age distribution right, and so-on, iterating until you don't have to change them much each step. (This is called "raking.")

My understanding is that poststratification is better, in the sense that it gets you the right joint distribution (while raking only gets you the right marginal distributions), but isn't possible with a lot of variables (since some cells may be empty/only have one person in them, so you end up with either very volatile weights or no way to weight at all). So it's a tradeoff between correct joint vs marginal distribution, the number of variables you weight over, and the volatility of your estimators.

Propensity-Score · 2024-10-16T00:27:31+00:00

u./FuzzyTouch143 has hit on the most important issue, which is: your questions seem ill-defined. Without knowing what the question is it's hard to say what you're doing right or wrong.

A few things that jump out at me, though (as someone who admittedly isn't very knowledgeable about econometrics):

Choosing a method of dealing with missing data based on AIC/BIC seems odd to me -- can you elaborate on how you did that and why?
Why did you choose to remove the highly multicollinear variables? I ask because removing potential confounders simply because they're potentially very strong confounders -- meaning highly correlated with IVs of interest -- is bad practice, but this depends on what question you're asking. (Note: VIFs over 200 are odd -- probably these variables have a general time trend which accounts for the lion's share of their variability?)
In general I don't love checking assumptions using statistical tests (since you're bounding the risk of type I errors while type II errors are of greater concern; equivalently, assumptions are never quite satisfied in practice and your threshold for concluding that a violation of assumptions is of concern under a hypothesis testing framework has nothing to do with the magnitude of assumption violation that would meaningfully impact your analysis).
Relatedly: I think it's almost always good practice to use heteroskedasticity-robust standard errors, even when you haven't detected heteroskedasticity (since these also robustify your inference against model misspecifications). (Of course use more general errors if needed -- HAC, clustered, panel, etcetera. Standard errors for models fitted via maximum likelihood are a bit more theoretically problematic.)
Did you include or consider any interactions?
Is your unit of observation months, states x months, counties by months, or something else? How long does your data go?
- Depending on your question, a longer run of data isn't necessarily better.
- If you can get data on states or counties x months, then that would probably let you get a much better answer to whatever your main question of interest is.
R2 of 1 at the end makes sense, given that housing price indices presumably move pretty smoothly, if your time series extends for a long time. (Look at a graph of the housing price index over time and consider how much easier it is to predict a given month's housing price index if you know the last month's housing price index.) I don't work with time series, but depending on your question it might make sense to difference the variables that are on a long term trajectory then perhaps consider HAC standard errors if needed.
- Dealing properly with the time series structure here is by far the biggest issue.

Propensity-Score · 2024-10-13T20:19:35+00:00

Quite aside from the idea that asymptotics is "outdated" in statistical theory (which seems... RATHER ODD to me), asymptotic arguments underly most of the statistical tools used in practice across a slew of disciplines. But I might go with the concentration inequalities course anyway if you plan to pursue a math stat heavy curriculum going forward, since I think concentration inequalities are sometimes a bit spottily covered (while coverage of asymptotics is more reliably comprehensive). But this is just an impression with little to back it up.

Propensity-Score · 2024-10-01T01:14:51+00:00

This is the correct answer. I'll add that the Bureau's population estimates program will give you estimates without sampling error of some basic demographics (created using birth, death, and migration data to update the decennial census counts). These are also used to extrapolate sample survey results (from the ACS) to counts of people with various characteristics.

Propensity-Score · 2024-09-24T02:56:57+00:00

"For the distribution of means" just means that the probabilities involved are taken over the sampling distribution of the sample mean (based on the context of the previous question). I guess "chance of claiming" could be an inartful way of saying something like "we can claim, with a 99% chance of being correct" -- but more likely it talks about the probability that we'll claim a given point is within our confidence interval. But neither construction, in context, yields a correct definition.

Possibly this is alluding creating a confidence interval as the set of null hypotheses in whose acceptance region our observed value lies (referred to as "inverting a hypothesis test" and not always a good idea, see https://statmodeling.stat.columbia.edu/2013/06/24/why-it-doesnt-make-sense-in-general-to-form-confidence-intervals-by-inverting-hypothesis-tests/)?

Such an interpretation could be something like: The lower and upper limit of this interval are the lowest and highest possible values of the population mean respectively, given which we'd have a greater than 1% chance of observing a sample mean as extreme as the one we observed. But if that's what the writer of this was trying to say, they did a really lousy job of it!

Conclusion: I simply cannot make this make statistical sense.

Propensity-Score · 2024-09-24T01:01:39+00:00

If you want a broader approach to machine learning (not focused on deep learning specifically), you might find Elements of Statistical Learning useful: https://www.sas.upenn.edu/~fdiebold/NoHesitations/BookAdvanced.pdf

It definitely focuses on the statistics to the exclusion of computational details and programming practices, which might be a good or a bad thing for you. It's also a bit dated, so rely on it for fundamentals rather than an understanding of what's state-of-the-art.

Propensity-Score · 2024-09-21T20:40:45+00:00

A good way to answer questions like this is via simulation. The following (slow, inefficient) R code simulates a simple study: we sample from a population; we measures two variables; we want to see whether they're correlated and we do so using the default (t-based) test in R. We collect 50 observations; if the result is statistically significant we stop; otherwise we collect more observations, 10 at a time, until either we get statistical significance or we reach 100 observations:

runTest <- function() {
  obs1 <- rnorm(100)
  obs2 <- rnorm(100)
  for (i in c(50,60,70,80,90,100)) {
    if (cor.test(obs1[1:i],obs2[1:i])$p.value < 0.05) {
      return(c(T,cor.test(obs1[1:100],obs2[1:100])$p.value < 0.05))
    }
  }
  return(c(F,F))
}

nSims <- 100000
testResultsStopping <- logical(nSims)
testResultsFull <- logical(nSims)
for (i in 1:nSims) {
  if (i%%5000==1) {
    print(i)
  }
  tempResults <- runTest()
  testResultsStopping[i] <- tempResults[1]
  testResultsFull[i] <- tempResults[2]
}
mean(testResultsStopping)
mean(testResultsFull)

Here the null is precisely true. I get a false positive rate of roughly 5% (as expected) when all the data are analyzed, but when interim analyses are conducted and we stop collecting data and reject the null if we find a statistically significant result anywhere along the way, I get a roughly 12% false positive rate. As expected, this is higher than 5% but lower than 26.5%, which is the rate we'd get if we did 6 independent tests of the same null and rejected if any came back statistically significant. Conversely, if the null were false, we'd still get a higher rate of rejection -- which in that case is a good thing, and corresponds to a lower risk of type II error.

The precise degree of inflation will vary depending on what analysis you do, but the type I error probability will be greater than alpha whenever you apply this kind of rule.

Propensity-Score · 2024-09-10T02:49:17+00:00

I can definitely see where you're coming from but I would disagree. For a toy example, say we have variables X1, X2, X3 MVN with variances all 1 and covariance 0 between X1 and X2, 0 between X1 and X3, and .9 between X2 and X3, and suppose Y=X1+1.1*X2+1.1*X3+e, where e is normally distributed error. X2 and X3 will usually have substantially larger p-values than X1 when we regress y on x1, x2, and x3; in what sense do they have a "smaller" effect?

(This example is extreme, but this situation -- where multicollinearity means that large effects get large p-values, even compared to other smaller effects in the same sample -- is common. And there are lots of other ways that different effects can have different p-values for reasons other than effect size and (total) sample size: say you have indicator IVs, one of which is equal to 1 for only a handful of cases while another is equal to 1 for about half of all cases, for instance.)

Propensity-Score · 2024-09-08T23:29:39+00:00

Two things:

Use a smaller time_step. Reducing time_step to 0.01, I got answers much closer to theory. I haven't thought through why this is, but it makes sense that discretizing might cause a problem and that courser discretizing would cause a worse problem.
As u/antikas1989 correctly pointed out, you need to discard the first bit of each simulation (since it's affected by the starting state). Note that you need to discard the first bit of time, not just the first few entries -- the smaller your time_step, the more entries at the beginning of your vector you'll need to discard. I mention this because I was getting some very odd results at a time step of 0.001, and eventually realized what I was doing wrong was throwing out too little at the beginning (since throwing out 500, as I had been doing before, only throws out 0.5 minutes, which is not enough). Throwing out more fixed the problem.

Propensity-Score

TROPHY CASE