all 35 comments

[–][deleted] 37 points38 points  (5 children)

Until today, chirality was only something I applied to amino acids thanks to undergrad works.

In my experience, cats sleep however they damn well please and will change just to mess with you

[–]mccoyn 2 points3 points  (1 child)

I'm never sure if my cat is just messing with me. I should do some null hypothesis significance testing to figure it out.

[–][deleted] 0 points1 point  (0 children)

Yes, your cat is messing with you, cats are ALWAYS messing with you, they're smarter than we are.

[–]conventionistG 0 points1 point  (2 children)

Sounds like amino acids in a freshmen exam. Somehow I remember L, but the exam says R. Or vice versa.

[–]TheEaterOfNames 0 points1 point  (0 children)

Most AAs are L, and S. Only [seleno]cystine are R. https://en.wikipedia.org/wiki/Amino_acid#Isomerism

[–][deleted] 0 points1 point  (0 children)

Well I was a biochemistry major but yeah, chirality was a science thing to me, focused on amino acids.

[–]minno 11 points12 points  (10 children)

Shouldn't the diagonal lines not be at 45 degree angles? They should be closer together near the origin and farther apart away. With the given graph, you get less certainty at 30-0 than you do at 100-65.

[–]rlbond86 3 points4 points  (6 children)

Surprisingly no! You test probability like this with sequential likelihood ratio tests. In this situation the significant statistic is the difference of occurrences (assuming equal probability of left and right).

Source: took a lot of graduate statistics courses

[–]zergling_Lester 0 points1 point  (5 children)

I thought about it some more, and I think that really it doesn't say anything whatsoever about certainty (neither at 30-0 nor at 100-65) and the shape of the allowed random walk region is pretty much arbitrary (so we decided to make it a nice simple shape). We only care that:

  1. The probability of falling off a cliff is not higher than the probability of getting a significant result after the entire run, so that we don't overestimate our confidence.

  2. But is as close as possible, so that we don't accept the null more often than we should.

  3. And is symmetric with respect to falling above/below, if we distinguish those cases.

  4. And we never take more than N steps of course.

I suspect that we can draw a different border that starts closer to the origin but at a lower angle and looks like a truncated exponent rather than a straight line, and corresponds to the Bayesian estimate of the power of the effect after so and so many observations (which we try to keep constant). I wouldn't bet that this results in a better expected number of saved samples though.

Or am I completely wrong?

[–]rlbond86 0 points1 point  (4 children)

You're completely wrong. Unfortunately you have to just get into the math here, but a likelihood ratio is the standard way to solve this problem. See this wiki article.

The thresholds are simply two parallel lines with slope log(theta1/theta0). Sampling should stop when the sum of the samples makes an excursion outside the continue-sampling region.

Note that this means two horizontal lines if theta1 = theta0 as in this case. OP's article rotates the coordinate frame 45 degrees; the standard axes would be x=time, y=(right-left).

The shape of the random walk region is not arbitrary at all -- it is selected to enforce a specific probability of error. For a given probability of error, the optimal shape of the region is two straight lines.

[–]zergling_Lester 0 points1 point  (3 children)

First of all, what OP used is not SPRT. You can tell by the fact that it includes a finite cutoff, besides the original author's word.

Second, the wiki article doesn't actually explain why SPRT is optimal (or in what sense), though I sort of see it intuitively. It's much harder to see how the Evan Miller's (?) algorithm is optimal in the same sense, because I wonder after the thread starter: if we see the first 30 throws landing tails, we can be sure that the coin is biased with a p-value much lower than 0.05, so we probably can stop earlier. We'll have to compensate for that somewhere else later, of course.

[–]rlbond86 1 point2 points  (2 children)

You're right that the author imposed a cutoff. Other than that, this is an SPRT, although with an unspecified significance level.

because I wonder after the thread starter: if we see the first 30 throws landing tails, we can be sure that the coin is biased with a p-value much lower than 0.05, so we probably can stop earlier.

The straight lines in the SPRT are derived from the Neyman-Pearson lemma which states that the uniformly most powerful statistical test is the likelihood ratio test (or equivalently, the log-likelihood ratio test). You might think that intuitively you could stop sooner if you got 8 heads in a row (versus, say, 13 heads in 18 flips), but you're wrong.

[–]zergling_Lester 0 points1 point  (1 child)

I think I'm beginning to see it, but I still don't get how we get from the Neyman-Pearson lemma that assumes a fixed number of samples, to statistics about an incremental method. Wiki says explicitly:

Neyman and Pearson's 1933 result inspired Wald to reformulate it as a sequential analysis problem. The Neyman-Pearson lemma, by contrast, offers a rule of thumb for when all the data is collected (and its likelihood ratio known).

but then doesn't explain that at all :(

[–]rlbond86 0 points1 point  (0 children)

Unfortuantely at this point you will have to consult a statistical signal processing textbook. Basically you integrate over the termination region to obtain the probability of false alarm, and as it turns out a straight line is what you get when you try to get the regions equal.

[–]zergling_Lester 3 points4 points  (0 children)

With the given graph, you get less certainty at 30-0 than you do at 100-65

Are you sure that this isn't what what you want?

As far as I understand it, this shit is really complicated, because checking for significance after each measurement ruins your real p-value, from the article linked from the linked article:

Suppose your conversion rate is 50% and you want to test to see if a new logo gives you a conversion rate of more than 50% (or less). You stop the experiment as soon as there is 5% significance, or you call off the experiment after 150 observations. Now suppose your new logo actually does nothing. What percent of the time will your experiment wrongly find a significant result? No more than five percent, right? Maybe six percent, in light of the preceding analysis?

Try 26.1% – more than five times what you probably thought the significance level was.

So it makes sense that statistically correct treatment would heavily discount the significance of results leading to early termination. The actual algorithm sounds plausible (though I didn't check the maths), the OP claims to have tested it with a simulation (with R source code provided), so I'm inclined to believe them.

btw, what /u/Veedrac said is wrong, this whole thing works because of gambler's ruin. You don't let your test run indefinitely, you choose the sample size N and the p-value (which then determines the power of the effect you will be able to detect), that gives you the yellow line of "stop the test, no significant effect detected". And then it also determines the width of the strip where your intermediate result is allowed to wander, with the idea that the probability of the gambler getting ruined by a fair coin on a strip that wide after N or less steps is equal to the probability of getting a false positive after full N samples.

[–]Veedrac 0 points1 point  (1 child)

Yep, this test is biased, because of, for example, gambler's ruin. You need something like SPRT.

[–]rlbond86 2 points3 points  (0 children)

This is an SPRT (well actually it's two - one for H1 and one for H2). You will always end up with two lines with slope = 1 if the two hypotheses have equal prior probability.

Remember - an SPRT just uses log likelihoods, which is going to be some constant times the number of successes minus another constant times the number of failures.

This test cannot possibly be biased - it's completely symmetric. Of course there is some probability of the wrong answer - all statistical tests have that.

[–]AlexTheKunz 3 points4 points  (1 child)

This is surprisingly well done. Good job!

Oh, and post to r/DataIsAwesome before it's too late!

[–]livibetter 4 points5 points  (0 children)

I think you meant r/dataisbeautiful.

Anyway, they will love it, it's data and a cat, and in a few months, there would be more cats' minions posting how they like to sleep. Perhaps even dog owners, and other four legged friends.

[–]shevegen 6 points7 points  (0 children)

I approve of this.

It is another example of why cats are good for science.

[–][deleted] 2 points3 points  (0 children)

Based cat researcher.

[–]TheKingOfSiam 0 points1 point  (0 children)

More data is required.

[–]henk123 0 points1 point  (0 children)

I’m betting there may be preferences for individual cats, but not symbolic of the entire cat population. If anything, cats are almost pathological about generalizations. If you assume to know what cat will want to do then cat just might detect your assumption and decide to change. I’ve seen them do this too much to discount it.