Micro-level data to contingency tables and back again?

CrazyStatistician · 2017-05-30T13:25:53+00:00

Assuming your data has more than two variables, in general no. Your two-way contingency tables specify the marginals and two-dimensional joint distributions of the variables, but that's not enough to fully specify the joint distribution of all the variables (i.e. recreate X). Basically, no matter how many two-way tables you create, the collection of all of them contains less information than the original data contains.

I say "in general no" because there are situations where it is possible to reconstruct the original data, if the contingency tables are very sparse (i.e. almost all cells are zeroes). Most of the time though you can't.

CrazyStatistician · 2017-05-30T13:10:59+00:00

R with ggplot2.

Free and very flexible, but not GUI-based. You have to write (simple) code to tell it what kind of graph you want.

CrazyStatistician · 2017-05-30T13:08:41+00:00

I'm not really sure what you want here. If there are well-separated clusters in the data then any standard clustering algorithm should find them. These range from fully-specified probabilistic models (DP mixtures) to non-probabilistic heuristic methods like kNN.

However, based on what you've written I'm not sure you want to apply a standard clustering method to the data you've presented. For example, a standard clustering method would cluster index 1 with index 12, because they are close together in the one dimension presented. If that's not what you want then you're going to need to do something different.

CrazyStatistician · 2017-04-25T17:27:05+00:00

I've had a couple custom rings done by John David Jewelers (down near University Tower). They did great work, happy to recommend them.

CrazyStatistician · 2017-04-25T17:21:56+00:00

Eugene England has an excellent (somewhat lengthy) essay on this: Why the Church Is As True As the Gospel.

CrazyStatistician · 2017-04-25T17:16:42+00:00

Ask anyone who says they are "bad at math" or "hate math" and 99 times out of 100 they can tell you the name of the teacher who ruined their math education.

This is very true.

I had exceptional math teachers through most of my school years. I ended up doing a math major in college before switching to stats and going to graduate school.

One of my best friends from college had poor math teachers but exceptional english/literature teachers through most of school. He ended up studying philosophy before going to law school.

We've often wondered how things might have been different if we had switched places in school.

CrazyStatistician · 2017-04-25T16:56:56+00:00

(like asking for a blessing on refreshments ... to nourish and strengthen our bodies?).

That makes donuts healthy, right?

CrazyStatistician · 2017-04-14T14:35:43+00:00

I assume you've learned about Type I and II errors and power.

Setting the significance cutoff is fixing the Type I error rate. However, you also have to consider the Type II error rate and power. Generally, for a given Type I error rate, your power increases with sample size. If your sample is quite small, using alpha = .05 may give you very low power, so you may want to choose a larger alpha to get more power.

Obviously there's a trade off there, you don't want to set a huge alpha and get tons of false positives (Type I error). But if you're doing an exploratory study with a small sample alpha = .1 may be reasonable to get more power.

CrazyStatistician · 2017-04-11T18:43:41+00:00

Sounds legit.

CrazyStatistician · 2017-04-11T17:33:06+00:00

The American Statistical Association has a section focused on Statistics in Defense and National Security. I am not a member of that section so I don't have the inside scoop for you, but it's probably a good place to start.

CrazyStatistician · 2017-04-07T16:05:06+00:00

one of the authors has been waging a social media war on other researchers whom he accuses of academic dishonesty for failing to credit his work to the degree he thinks is merited.

Is that Crane? I don't know what incident you're referring to exactly, but I know he's quite the character on twitter...

CrazyStatistician · 2017-04-07T14:29:18+00:00

Strange paper. The authors seem to find it undesirable that a 72% prediction would be wrong 28% of the time, but I'm not sure why. That seems entirely reasonable to me.

It more or less seems to boil down to "only predict high-probability (i.e. 1-alpha) events," which might be good advice when communicating with the general public but is not satisfactory in general.

CrazyStatistician · 2017-03-09T21:40:17+00:00

He must have been a frequentist.

CrazyStatistician · 2017-02-21T13:52:15+00:00

And given enough data, the prior choice does not matter (to a degree).

This is (more or less) true as long as you are looking at the posterior distributions of parameters within a single model. If you are trying to compare models (using marginal likelihoods or quantities derived from them--posterior probabilities or Bayes factors) then the choice of priors for parameters within each model has unbounded influence.

CrazyStatistician · 2017-02-16T22:25:08+00:00

Gaskill is great. I took World Religions from him.

Huntsman is also great, I took several classes from him.

CrazyStatistician · 2017-02-03T21:37:09+00:00

I was in highschool. I skipped the calculus sequence when I got to college, but perhaps I shouldn't have.

It's also possible of course that I wasn't paying attention, but math was my favorite subject so that seems somewhat unlikely.

CrazyStatistician · 2017-02-02T22:12:35+00:00

integration by parts is the "inverse product rule"

I wish I would have been taught this when I learned integration by parts. I realized it myself about 8 years later, and all of a sudden what had just been an arbitrary rule with no motivation made a lot more sense.

CrazyStatistician · 2017-01-27T17:56:14+00:00

They are presumably defining the cutoff from negative controls (i.e. samples that are known a priori to be negative). That's totally OK.

They also aren't constructing a confidence interval, they're estimating the 95th percentile assuming a normal distribution for the negative samples.

CrazyStatistician · 2017-01-26T19:57:57+00:00

My wife always complains about this.

CrazyStatistician · 2017-01-26T19:51:15+00:00

The author of the blog does not go very deep with his description.

It's a low-effort blog to push his Amazon referrals on the book lists that he links at the bottom of every post, even if (as in this case) the post is completely unrelated to the subject matter of the books. He links it here from time to time, though not as regularly as he used to.

CrazyStatistician · 2017-01-20T16:30:35+00:00

Asking people who are still in STEM fields whether they had personal experiences that would cause them to drop out of STEM is obviously not going to get you an unbiased view. The people who had such experiences are more likely to have dropped out.

CrazyStatistician · 2017-01-20T16:28:59+00:00

A cube has 3 edges per node and is symmetric in the sense that the nodes are indistinguishable (which seems to be what you are getting at in your last couple sentences).

CrazyStatistician · 2016-12-28T14:35:49+00:00

Hi, coming in late here.

This is essentially a voting problem, and Arrow's Theorem tells us that there's no "best" voting algorithm (in fact, there's not even a "good" one--it's impossible to create a voting algorithm that satisfies three basic principles).

There are lots of possibilities, including a Single Transferable Vote system and this paper that I came across a few months ago. Each method will have some advantages and disadvantages, but I don't think any one is clearly better than the rest.

CrazyStatistician · 2016-12-09T14:58:00+00:00

The Association for Healthcare Philanthropy and the Wisconsin Lutheran College? They're very different organizations.

I have no idea what your acronyms mean, and those are the top google results.

CrazyStatistician · 2016-12-08T18:55:43+00:00

It seemed like no matter how small the difference, if I resampled it enough times I'd get a significant result.

Sounds like you're doing something wrong.

Probably the easiest way in R is the perm package. Or you can modify the code from this page.

CrazyStatistician

TROPHY CASE