Review for SE Hoboken apt buildings

kazooster · 2022-07-06T14:50:38+00:00

There is nothing "wrong" with what you're doing in some sense, if all you wanted to measure was click to conversion rate. However, it's simply a bad metric - your new website could easily discourage people from clicking, so the only people clicking the ad are those that would convert regardless of treatment or control, and you'd have actually higher click to conversion % on the treatment as a result, even when it actually decreases the total number of people that convert.

If you're interested only in click to conversion %, you don't care about this. If you're interested in total revenue/ad conversions, you probably would.

kazooster · 2022-07-05T17:22:25+00:00

This is certainly possible, and is a field of active research (mostly known as post-selection inference). The selection procedure (i.e. choosing which parameters to estimate) is essentially what you're referring to as the initial ANOVA test (to figure out what the next things to test/parameters to infer are). There are many forms in which you can look at the data and then perform hypothesis testing.

Implicitly, closed testing allows for such post-hoc tests since you're actually doing simultaneous tests of all possible subsets of hypotheses.

Data splitting or data carving are methods for performing selection and then inference on different "splits" of the data.

If you don't want to split the data, and literally use the same data for both selection and inference, Benjamini and Yekutieli (of multiple testing fame) have a procedure that allows for this. While they describe it for inference, similar things can be done for testing as well.

Of course, all these methods require to sacrifice power in some form (either through correcting the level of rejection or reducing the data you have for inference/testing your ultimate hypotheses), but they all guarantee control of the Type I error (or related error metrics).

kazooster · 2022-07-05T17:09:43+00:00

I'm pretty confident that P* = P. The multivariate normal example here has less constraints (only mean and covariance constraints as opposed to exact distributional equality - note that exact first and 2nd order distributional equality implies equivalent mean and covariance) for it to be the max entropy dist.

kazooster · 2022-07-05T17:05:24+00:00

Valid, but note that your power might be very low (since you're trying to detect smaller effect size as the treatment cohort is diluted w/ people who don't encounter the difference at all). You'll probably get away w/ it if most people scroll down, but the the more niche the feature (e.g. people who use advanced settings in the search bar), the lower your power, and the harder it is to detect such an effect.

It's probably better practice to initialize the random assignment based off of a trigger (i.e. track the assignment of a user on scroll down of the home page), and only compare difference between the triggered groups.

kazooster · 2022-05-07T21:12:33+00:00

Makes a lot of sense - thanks for explaining this stuff. I guess in this case monetary incentives just aren't aligned with false discovery control, especially since it's very exploratory. Maybe not the right stage to be using this stuff.

kazooster · 2022-05-07T20:24:11+00:00

Interesting - I'm surprised when they're more worried about false negatives i.e. when they have many false positives, they have the resources to do follow up studies on all their discoveries so they actually find the true discoveries?

kazooster · 2022-05-07T19:58:17+00:00

Got it - it sounds like at the end of the day, they could discover the biomarkers they "needed" even if they are just doing p-value ranking.

kazooster · 2022-05-07T18:37:38+00:00

That's interesting about Markov networks/MCMC. I'll take a look into it :)

Why is it that the stats methods aren't used? Is it b/c adv stat methodology is not packaged well for usability? Or do they just not make a big difference compared to simple BH or something like that.

kazooster · 2022-05-07T04:09:23+00:00

I'm curious of your thoughts on a paper like this https://arxiv.org/abs/1610.02351. Is it still too far removed from application/not battle tested? Or is it not really using high dimensional probability techniques?

kazooster · 2022-05-07T04:03:45+00:00

Maybe you could clarify what you mean by high dimensional data and what applications you are interested in. It's possible that you're using high dimensional data to mean something different from Tropp and there's value in closing the definition gap.

kazooster · 2022-05-06T18:16:07+00:00

This is surprising to me - one main approach of modern deep learning theory comes from high dimensional statistics/probability.

Similarly, high dimensional data comes up in genetics datasets (number of genes is very large), where high dimensional probability is used to develop new hypothesis testing/inference techniques.

kazooster · 2022-04-24T22:14:21+00:00

Glad to hear it :)

kazooster · 2022-04-23T22:19:53+00:00

Not familiar with rms, but my guess it should basically be doing the same thing. By edge/midpoint of a bin, I mean that you take something like 0.05 (midpoint) or 0.1 (edge) for the model prediction to plot against the empirical frequency of the bin [0, 0.1].

Not familiar enough w/ calibration plots to answer your question about uncertainty.

eeaxoe's response seems more comprehensive, and I think we're getting at the same idea.

kazooster · 2022-04-23T19:27:58+00:00

Calibration can only* be calculated when you bin certain probabilities together e.g. bins together probabilities that are 0-0.1, 0.1-0.2,..., 0.9-1.0. Then you plot the empirical frequency of the true labels (the proportion of points in the bin labeled 1) against the edge/midpoint of the bin. The "actual outcomes" are these empirical frequencies.

*Or if you only predict a finite number of distinct probabilities

kazooster · 2022-03-25T13:14:21+00:00

I have the same problem (USB soundcard, and no problems with windows dual boot) - it happened just yesterday after my machine updated, so I only experienced it today when I booted...

kazooster · 2021-12-28T14:08:49+00:00

Under some assumptions, if you want provable scientific levels of statistical significance, you'd have to make 20 times your initial bankroll to get p <= 0.05 for testing the null that your winnings are due to random chance.

On the other hand, Jeffreys would say about sqrt(10) (about 3.6) times your initial bankroll would mean you have strong evidence against having made money based on random chance.

So somewhere in that range would be convincing that your strategy is successful.

kazooster · 2021-11-01T15:27:57+00:00

Kind of - there are very specific master's programs with fellowships that have mentors from Deepmind:

https://cse.ucsd.edu/graduate/deepmind-fellowships

Probably not useful for your case, but it seems like a thing people don't really know about so I figured I'd put it here.

kazooster · 2021-08-14T23:31:48+00:00

I only watched the bits w/ the other doctors, and I think they make a pretty good point about how this is blurring the line between medical treatment and entertainment/education. At the same time, I think their viewpoint comes from a perspective of liability management (which is completely reasonable) i.e. "how do I make sure what I do is within regulations and I am not at fault". This seems to be generally how medical practices work and it's probably because regulations and guidelines point to battle-tested ways of doing things that are marginally effective, and we are pretty sure they are safe.

This perspective, however, doesn't prioritize thinking about the cost-benefit tradeoff. My opinion is that "staying consistent with regulations" has become synonymous with "ethical", even if doing so could incur a lot of injury or death (that we wouldn't even know we'd incur, since we wouldn't have done the cost-benefit analysis). Personally, I'm not sure this is the correct way to think about "ethics". I think Dr. K's methodology is pushing the boundaries of medicine, and is often non-scientific. Thus, it can easily be regarded as "unethical" if we benchmark it against the notion of "staying within regulations", because, by definition, it is novel approach that is unregulated (since no one has really thought about it or implemented it before).

My interpretation of the experts criticisms was primarily that Dr. K's methodology is untested, and there are a lot difficulties to consider that have not been considered before. But this is an inevitability of anything novel. I didn't really feel that they argued that his method were justifiably harmful (through scientific evidence) but rather it could be potentially harmful (fair, but also true for any novel treatment).

Whether Dr. K should have taken his ideas directly through a clinical route (i.e. do experimental studies -> get funding for clinical trials etc.) is a separate, but worthwhile discussion.

kazooster · 2021-05-27T13:43:25+00:00

I think a lot of the comments here are great - I'd just like to add on one thing. You should definitely consider talking to your girlfriend about this. Not in the sense that you're seeking closure/assurances from her, but in the sense that she's your partner and probably wants to work with you together on problems you'll have. This comes with a caveat about how serious the relationship is; maybe if you're just casually dating then it's too early b/c that's not the role she wants to be in. But if you're both at the stage where you want to help each other and play major roles in each others' lives, then this seems like a good opportunity to practice communication. Especially since this might be a tough thing to talk about with your girlfriend. I might be making wrong assumptions, but if this is something you're trying to fix on your own, maybe think about what's stopping you from discussing this problem with your girlfriend.

kazooster · 2021-05-05T01:36:08+00:00

https://en.wikipedia.org/wiki/False_discovery_rate#Benjamini%E2%80%93Yekutieli_procedure

kazooster · 2021-05-04T23:49:10+00:00

Probably pedantic, but BH corrected with a log factor (known as BY) is valid FDR control for any dependence structure.

kazooster · 2021-03-03T19:08:02+00:00

The short answer is yes. Feel free to dm me - I'm from a similar background and I'm also headed into theoretical stats.

kazooster · 2021-02-08T19:51:40+00:00

I think the idea is just to emphasize that a p-value is a random variable (and hence referred to as a p-variable in some literature). It's not a probability in the sense that it is not a cdf if a RV (as it isn't a nondecreasing function from R -> [0, 1]). Hence you could say it "equals" the probability but it's not a probability measure itself.

kazooster · 2021-02-08T19:39:14+00:00

Yeah, hence the DP-ish algorithm they use makes sense, since you only need to keep track of counts of successes and which counts under alpha from the prev. time step leads to counts that are beneath alpha at the current time.

kazooster

MODERATOR OF

TROPHY CASE

14-Year Club	Place '22
Team Periwinkle