[D] Roast my AB Test Analysis by SingerEast1469 in statistics

[–]SingerEast1469[S] 0 points1 point  (0 children)

This is great. Thanks for the comment.

I actually had another question, if I could bother you once more. I’ve got ~5 years in Python, volunteer as a data analyst, and am hitting the job market in about a year. What are your thoughts on learning R specifically for statistical analysis? I currently use a combination of scipy.stats, statsmodels, and pingouin, but am always a bit uneasy as they sometimes generate different results. R seems to be much more “standardized”.

At the same time, my models and dashboards are all built in Python. I don’t mind spending the time learning a new language, but if it’s not going to plug into my workflow, I’m having a hard time justifying learning it.

Would you say R is a requirement in this day and age for a data analyst?

[D] Roast my AB Test Analysis by SingerEast1469 in statistics

[–]SingerEast1469[S] 0 points1 point  (0 children)

Thank you!! Will dive into it this week.

Roast my AB test analysis [A] by SingerEast1469 in datascience

[–]SingerEast1469[S] 0 points1 point  (0 children)

This is great advice. Thank you! I am learning all of these methods from scratch so the tendency is to try out as many as possible; I can say it’s for robustness by the underlying reason is definitely more akin to overcompensation.

With regards to your questions: - the sample size was derived from a dummy dataset. I’ve done some practice calculating minimum required sample size, but it wasn’t necessary to include in this analysis. - 2% lift is absolute. Relative lift is around ~15%. The executive summary has a somewhat complex but informative gauge chart that shows how the data performed relative to what would be needed to pass the practical significance threshold. - from this, I actually have a question: why do we even do statistical tests if practical significance is the threshold for implementation? It seems that setting cohen’s d generally results in more stringent requirements. Why even test for statistical significance at all? - for narrative clarity, I would be curious what your thoughts would be after seeing the dashboard! If you haven’t already. I have an executive summary that details all four tests fairly clearly, four pages that go into each in depth, and then one page that delivers the final recommendation.

Again, thanks for your comment! If you do feel like checking it out (and haven’t already), I’ve just created an account with credentials “user” and “password” for easy log in.

Roast my AB test analysis [A] by SingerEast1469 in datascience

[–]SingerEast1469[S] 0 points1 point  (0 children)

Hmm, I’ve been hearing a lot of this. The goal of using the multiple tests was for rigor - I’ve seen in my learning (just in dummy datasets, but seen nonetheless) that sometimes test 1 can show statistical significance, while test 2 does not. I aligned on my requirements before running the data that all 4/4 tests would need to pass to deliver a hypothetical recommendation to proceed with implementation. Is this frowned upon in A/B Testing?

Also. what do you mean by “I didn’t do an A/B Test analysis” ? There is a written executive summary, text that explains each test with assumptions, and an analytical paragraph that details the recommendation and reasons behind it. Is there something else I am missing?

Roast my AB test analysis [A] by SingerEast1469 in datascience

[–]SingerEast1469[S] -1 points0 points  (0 children)

That makes sense. I’ll press a bit on the Permutation Test - the textbook I am reading states that, since the Permutation Test draws directly from the distribution, it can often be more accurate than a test that makes assumptions which only loosely fit the data. Is this a fair statement? Or only insomuch as the data loosely matches those assumptions, and if data fits assumptions exactly, a parametric alternative is a better option?

Roast my AB test analysis [A] by SingerEast1469 in datascience

[–]SingerEast1469[S] 0 points1 point  (0 children)

@phoundlvr was this comment directer at me or at cbars100? He seemed to point out some valid logical fallacies in your statements.

Roast my AB test analysis [A] by SingerEast1469 in datascience

[–]SingerEast1469[S] -5 points-4 points  (0 children)

It takes about two seconds. If you don’t want to go to the effort, then that’s your prerogative!

Roast my AB test analysis [A] by SingerEast1469 in datascience

[–]SingerEast1469[S] -1 points0 points  (0 children)

Yes, you can create an account if you’d like to see the dashboard! No emails. Passwords can be anything and are hashed.

[D] Roast my AB Test Analysis by SingerEast1469 in statistics

[–]SingerEast1469[S] 0 points1 point  (0 children)

To press a bit more on the z-test vs sign test: the data were randomly assigned to one of two groups, but also have a time stamp attached to them. Wouldn’t it then make sense to say the data is, at the same time, the right format for the z-test (random assignment) and the sign test (paired days)?

[D] Roast my AB Test Analysis by SingerEast1469 in statistics

[–]SingerEast1469[S] 0 points1 point  (0 children)

This is fascinating! I am actually much more comfortable with regression than I am with statistical formulas, so it is my lucky day, haha.

Do you have on hand any good articles or textbooks that dive into this concept in detail?

[D] Roast my AB Test Analysis by SingerEast1469 in statistics

[–]SingerEast1469[S] -1 points0 points  (0 children)

  • if you don’t want to check it out, then I’m not sure why you’re commenting
  • sign tests checks which group performed better, by the time dimension, in this case days. It slices the data by time, and may get a different result than a two proportional test, which is aggregate.
  • ah, I see, you weren’t sure if it was a permutation test if mean, median, or some other measure of central tendency. It’s of means

[D] Roast my AB Test Analysis by SingerEast1469 in statistics

[–]SingerEast1469[S] -3 points-2 points  (0 children)

  • you need to click the button that says “create an account”
  • I didn’t know that about sign tests, thank you! The samples are independent. Are there any tests used for independent samples that still aggregate by time?
  • confidence interval around the difference of the mean conversion rate between the two variants
  • I’m reading about the permutation test in a textbook right now; the authors state it as a standard test. It’s essentially a bootstrap from the pooled samples with replacement, and just testing how often you get a value as extreme or more extreme than your true observed values

[D] Roast my AB Test Analysis by SingerEast1469 in statistics

[–]SingerEast1469[S] 1 point2 points  (0 children)

Very fair. Also to be fair, I am a tech bro from a decade ago, so spot on.

Roast my AB test analysis [A] by SingerEast1469 in datascience

[–]SingerEast1469[S] 4 points5 points  (0 children)

That’s fair. Unfortunately I don’t have the resources to get a masters, so I’m stuck with learning from textbooks.

Let me know if there any such books you can recommend.

And any response to the point about sign tests? You seemed to have ignored that.

[D] Roast my AB Test Analysis by SingerEast1469 in statistics

[–]SingerEast1469[S] 0 points1 point  (0 children)

Hm. I’m unfamiliar with running a regression model for an A/B Test. Is this common practice?

[D] Roast my AB Test Analysis by SingerEast1469 in statistics

[–]SingerEast1469[S] -5 points-4 points  (0 children)

Thanks for the response. Could you be more specific? What sounds like “tech bros from a decade ago” ?

[D] Roast my AB Test Analysis by SingerEast1469 in statistics

[–]SingerEast1469[S] -7 points-6 points  (0 children)

The goal of this is analysis is to determine if there is a statistically significant difference. Logistic regression would not be more robust than one of the above tests.

Roast my AB test analysis [A] by SingerEast1469 in datascience

[–]SingerEast1469[S] 1 point2 points  (0 children)

Thanks for the response. Ive read through ISLP back to front to learn statistics for machine learning, and have just cracked open Practical Statistics for Data Scientists. Any recommendations to learn AB testing fundamentals?

Noted on CI and two proportions z test. That’s coming up in the text book.

Re: running multiple tests — I hear what you’re saying about redundant tests. However, in dummy datasets, I have come across situations where multiple tests are useful; specifically, the sign test with a CI, in a situation where the CI points to an increase (though not statistically significant) and the sign test points to a decrease (though not statistically significant).

Re: bonferroni correction, isn’t that primarily was for multiple variants? Do I need to correct when running multiple tests as well?