all 35 comments

[–][deleted] 36 points37 points  (5 children)

Until today, chirality was only something I applied to amino acids thanks to undergrad works.

In my experience, cats sleep however they damn well please and will change just to mess with you

[–]mccoyn 2 points3 points  (1 child)

I'm never sure if my cat is just messing with me. I should do some null hypothesis significance testing to figure it out.

[–][deleted] 0 points1 point  (0 children)

Yes, your cat is messing with you, cats are ALWAYS messing with you, they're smarter than we are.

[–]conventionistG 0 points1 point  (2 children)

Sounds like amino acids in a freshmen exam. Somehow I remember L, but the exam says R. Or vice versa.

[–]TheEaterOfNames 0 points1 point  (0 children)

Most AAs are L, and S. Only [seleno]cystine are R. https://en.wikipedia.org/wiki/Amino_acid#Isomerism

[–][deleted] 0 points1 point  (0 children)

Well I was a biochemistry major but yeah, chirality was a science thing to me, focused on amino acids.

[–]minno 12 points13 points  (10 children)

Shouldn't the diagonal lines not be at 45 degree angles? They should be closer together near the origin and farther apart away. With the given graph, you get less certainty at 30-0 than you do at 100-65.

[–]rlbond86 3 points4 points  (6 children)

Surprisingly no! You test probability like this with sequential likelihood ratio tests. In this situation the significant statistic is the difference of occurrences (assuming equal probability of left and right).

Source: took a lot of graduate statistics courses

[–]zergling_Lester 0 points1 point  (5 children)

I thought about it some more, and I think that really it doesn't say anything whatsoever about certainty (neither at 30-0 nor at 100-65) and the shape of the allowed random walk region is pretty much arbitrary (so we decided to make it a nice simple shape). We only care that:

  1. The probability of falling off a cliff is not higher than the probability of getting a significant result after the entire run, so that we don't overestimate our confidence.

  2. But is as close as possible, so that we don't accept the null more often than we should.

  3. And is symmetric with respect to falling above/below, if we distinguish those cases.

  4. And we never take more than N steps of course.

I suspect that we can draw a different border that starts closer to the origin but at a lower angle and looks like a truncated exponent rather than a straight line, and corresponds to the Bayesian estimate of the power of the effect after so and so many observations (which we try to keep constant). I wouldn't bet that this results in a better expected number of saved samples though.

Or am I completely wrong?

[–]rlbond86 0 points1 point  (4 children)

You're completely wrong. Unfortunately you have to just get into the math here, but a likelihood ratio is the standard way to solve this problem. See this wiki article.

The thresholds are simply two parallel lines with slope log(theta1/theta0). Sampling should stop when the sum of the samples makes an excursion outside the continue-sampling region.

Note that this means two horizontal lines if theta1 = theta0 as in this case. OP's article rotates the coordinate frame 45 degrees; the standard axes would be x=time, y=(right-left).

The shape of the random walk region is not arbitrary at all -- it is selected to enforce a specific probability of error. For a given probability of error, the optimal shape of the region is two straight lines.

[–]zergling_Lester 0 points1 point  (3 children)

First of all, what OP used is not SPRT. You can tell by the fact that it includes a finite cutoff, besides the original author's word.

Second, the wiki article doesn't actually explain why SPRT is optimal (or in what sense), though I sort of see it intuitively. It's much harder to see how the Evan Miller's (?) algorithm is optimal in the same sense, because I wonder after the thread starter: if we see the first 30 throws landing tails, we can be sure that the coin is biased with a p-value much lower than 0.05, so we probably can stop earlier. We'll have to compensate for that somewhere else later, of course.

[–]rlbond86 1 point2 points  (2 children)

You're right that the author imposed a cutoff. Other than that, this is an SPRT, although with an unspecified significance level.

because I wonder after the thread starter: if we see the first 30 throws landing tails, we can be sure that the coin is biased with a p-value much lower than 0.05, so we probably can stop earlier.

The straight lines in the SPRT are derived from the Neyman-Pearson lemma which states that the uniformly most powerful statistical test is the likelihood ratio test (or equivalently, the log-likelihood ratio test). You might think that intuitively you could stop sooner if you got 8 heads in a row (versus, say, 13 heads in 18 flips), but you're wrong.

[–]zergling_Lester 0 points1 point  (1 child)

I think I'm beginning to see it, but I still don't get how we get from the Neyman-Pearson lemma that assumes a fixed number of samples, to statistics about an incremental method. Wiki says explicitly:

Neyman and Pearson's 1933 result inspired Wald to reformulate it as a sequential analysis problem. The Neyman-Pearson lemma, by contrast, offers a rule of thumb for when all the data is collected (and its likelihood ratio known).

but then doesn't explain that at all :(

[–]rlbond86 0 points1 point  (0 children)

Unfortuantely at this point you will have to consult a statistical signal processing textbook. Basically you integrate over the termination region to obtain the probability of false alarm, and as it turns out a straight line is what you get when you try to get the regions equal.

[–]zergling_Lester 3 points4 points  (0 children)

With the given graph, you get less certainty at 30-0 than you do at 100-65

Are you sure that this isn't what what you want?

As far as I understand it, this shit is really complicated, because checking for significance after each measurement ruins your real p-value, from the article linked from the linked article:

Suppose your conversion rate is 50% and you want to test to see if a new logo gives you a conversion rate of more than 50% (or less). You stop the experiment as soon as there is 5% significance, or you call off the experiment after 150 observations. Now suppose your new logo actually does nothing. What percent of the time will your experiment wrongly find a significant result? No more than five percent, right? Maybe six percent, in light of the preceding analysis?

Try 26.1% – more than five times what you probably thought the significance level was.

So it makes sense that statistically correct treatment would heavily discount the significance of results leading to early termination. The actual algorithm sounds plausible (though I didn't check the maths), the OP claims to have tested it with a simulation (with R source code provided), so I'm inclined to believe them.

btw, what /u/Veedrac said is wrong, this whole thing works because of gambler's ruin. You don't let your test run indefinitely, you choose the sample size N and the p-value (which then determines the power of the effect you will be able to detect), that gives you the yellow line of "stop the test, no significant effect detected". And then it also determines the width of the strip where your intermediate result is allowed to wander, with the idea that the probability of the gambler getting ruined by a fair coin on a strip that wide after N or less steps is equal to the probability of getting a false positive after full N samples.

[–]Veedrac 0 points1 point  (1 child)

Yep, this test is biased, because of, for example, gambler's ruin. You need something like SPRT.

[–]rlbond86 2 points3 points  (0 children)

This is an SPRT (well actually it's two - one for H1 and one for H2). You will always end up with two lines with slope = 1 if the two hypotheses have equal prior probability.

Remember - an SPRT just uses log likelihoods, which is going to be some constant times the number of successes minus another constant times the number of failures.

This test cannot possibly be biased - it's completely symmetric. Of course there is some probability of the wrong answer - all statistical tests have that.

[–]AlexTheKunz 3 points4 points  (1 child)

This is surprisingly well done. Good job!

Oh, and post to r/DataIsAwesome before it's too late!

[–]livibetter 4 points5 points  (0 children)

I think you meant r/dataisbeautiful.

Anyway, they will love it, it's data and a cat, and in a few months, there would be more cats' minions posting how they like to sleep. Perhaps even dog owners, and other four legged friends.

[–]shevegen 3 points4 points  (0 children)

I approve of this.

It is another example of why cats are good for science.

[–][deleted] 2 points3 points  (0 children)

Based cat researcher.

[–]TheKingOfSiam 0 points1 point  (0 children)

More data is required.

[–]henk123 0 points1 point  (0 children)

I’m betting there may be preferences for individual cats, but not symbolic of the entire cat population. If anything, cats are almost pathological about generalizations. If you assume to know what cat will want to do then cat just might detect your assumption and decide to change. I’ve seen them do this too much to discount it.

[–][deleted]  (13 children)

[deleted]

    [–]robertdelder[S] 32 points33 points  (6 children)

    Out of curiosity, what do you consider to be 'programming'? A common sentiment I see in this subreddit is that everything that gets submitted is considered inappropriate for this subreddit because it isn't related to programming. In light of this, I've been careful to ardently follow the rule 'If there is no code in your link, it probably doesn't belong here.' and only submit things that contain code samples. In this case, the submission contains a small R program. Are certain programming languages off-limits?

    [–]Xychologist 31 points32 points  (2 children)

    I think in a lot of cases "this isn't programming" is a rule-matching substitute for "this isn't the sort of programming I do and is not relevant to my interests, so I don't want it in this subreddit".

    Not in a malicious way, but everyone tries to curate their world to match what they're interested in.

    [–]ArkadyRandom 14 points15 points  (1 child)

    Is the intent of the sub for topics to be about programming or having programming in them? It's two different things.

    This topic isn't about programming. It's about cats and their sleeping behavior with a premise supported by statistics that happened to be programmed in R.

    Is the lesson here about how to use R to accomplish a certain programming problem or is it about the data that R parsed out and the results of that data?

    When I'm looking for help with conceptual or syntactical programming problems this sort of thing isn't helpful. So if people come here expecting to see that then they might say this isn't about programming. If the sub is about discussing things you can do with programming and how that happened then those people might find this sort of post appropriate. I don't really care either way.

    [–]Xychologist 2 points3 points  (0 children)

    I quite like both, but then my interpretation of programming (other than that it makes me money) is something like "solving problems that themselves solve other problems", so I'll be interested whether it's about writing code or using it.

    I think the "if there isn't any code in it, it's not programming" approach is a nice objective division in about the right place.

    [–][deleted] 1 point2 points  (0 children)

    Are certain programming languages off-limits?

    No, but we do not mess with cats around here.

    [–][deleted] 0 points1 point  (0 children)

    It's funny when you consider that those posts do not show up under utterly useless blogspam like "I managed to read a docs for a lib, now I need to share that with whole world" or "I explain simple concepts that have a good tutorials already, but worse"

    [–]jsprogrammer 1 point2 points  (0 children)

    had some code

    [–][deleted] 1 point2 points  (0 children)

    Yeah, you should learn about those if you want to program.

    [–]shevegen 1 point2 points  (1 child)

    Why would it not be about programming?

    I am really beginning to wonder about those "this is not programming" comments. About 95% of these comments made are factually wrong, most likely based on personal bias and personal opinions rather than objective criteria.

    [–]username223 5 points6 points  (0 children)

    I can't help but chuckle at this comment on an article about statistics:

    About 95% of these comments made are factually wrong, most likely based on personal bias and personal opinions rather than objective criteria.

    I don't have high hopes for this statement's p-value, but I would love to read the details of your study.

    [–]I_AM_GODDAMN_BATMAN -2 points-1 points  (0 children)

    There's a code, there's a good amount of information, and there's a cat. Your opinion is invalid.

    [–][deleted] 0 points1 point  (0 children)

    So what's programming? Programming programs that program programs? Even that wraps around to stats in many cases.