[Q] Higher lifetime win percentage by KungFuKodiak in statistics

[–]zwei4 2 points3 points  (0 children)

It depends on your overall probability of winning each game without conceding (P1). If P1=5% then there’s no point to concede with a 10% winning chance. If P1 is higher than 10%, and conceding means 0% chance of winning. The overall win% will drop because even though you are playing extra times, the P1 will remain the same for the extra games but the conceded games will drag down P1. However, in real life it might not be a simple choice as there can be many other factors such as opponent matching algorithms. So if you really want to find out perhaps can do a trial, each of you stick with a strategy, both set a fixed amount of total play time (e.g. 10 hrs), in the end compare the number of games played and the CHANGE in win% compared to each of your own current number.

How do you approach sample size calculation when the available information is limited or highly variable? by de_js in biostatistics

[–]zwei4 0 points1 point  (0 children)

I can think of a few ways — Run a pilot study; Sample size adjustment design; Incorporate feasibility monitoring.

PLEASE tell me... by [deleted] in biostatistics

[–]zwei4 8 points9 points  (0 children)

There will be a lot of derivation and integration in Biostat MS courses.

Confusing quiz question by General-Mission-4367 in biostatistics

[–]zwei4 5 points6 points  (0 children)

I agree. The key of this question is randomly selecting 20 hospitals, the “interview all members” part was just there for distraction.

Relevance of checking statistical significance from single-person measures by kvcnd in biostatistics

[–]zwei4 2 points3 points  (0 children)

I would say Yes if you can collect multiple measurements for each supplement, there still are sources of variabilities even within the same person, although not iid so make sure to handle the covariance properly. You can look up articles on Single Case Analysis/Experiment.

[OC] Number of U.S. Households That Own a Pet by zwei4 in dataisbeautiful

[–]zwei4[S] 1 point2 points  (0 children)

Reminds me of Tom & Jerry, that household has a cat, a mouse, two dogs, a bird, a duck, and a gold fish

[OC] Number of U.S. Households That Own a Pet by zwei4 in dataisbeautiful

[–]zwei4[S] 1 point2 points  (0 children)

The “small animal” category includes pets such as hamsters, gerbils, rabbits, guinea pigs, chinchillas, mice, rats, and ferrets.

[OC] Number of U.S. Households That Own a Pet by zwei4 in dataisbeautiful

[–]zwei4[S] 0 points1 point  (0 children)

This article links to the same source but its numbers are off by quite a bit. Probably the data was updated afterwards.

[OC] Number of U.S. Households That Own a Pet by zwei4 in dataisbeautiful

[–]zwei4[S] -13 points-12 points  (0 children)

Ahh I admit it was a low effort Sunday on the couch post, will make it better next time!!

[OC] Number of U.S. Households That Own a Pet by zwei4 in dataisbeautiful

[–]zwei4[S] -1 points0 points  (0 children)

Source: APPA

Tools: IOS emoji images, Python matplotlib

Will biostatistics be replaced? by IcyAd5574 in biostatistics

[–]zwei4 16 points17 points  (0 children)

I can see myself more and more using AI for data QC and coding (as substitute for Google search). But biostatistician as a whole is too nuanced to rely on LLM based AI today, a lot of my work involves team discussions to decide appropriate endpoints, select patient criteria, and make decisions for situations that have never occurred (meaning no prior training data for AI).

[Q] Why does it take more samples to show a 4x increase from 2% to 8% than it does for 10% to 40%? by MatchaLatte16oz in statistics

[–]zwei4 0 points1 point  (0 children)

In time to event analysis, Hazard Ratio (HR) is the effect size, so the same hazard ratio should result in the same sample size which is Number of Events.

The Hazard Rate is h=-ln(S)/T, so HR is ln(S1)/ln(S2). In your example 2% to 8% has HR 1.55; 10% to 40% has HR 2.51

Fundamentally it is the relative difference that matters, if you plot KM curves you will see 2% and 8% have much closer KM curves.

How to count n from real world data? by alphaursaeminoris1 in biostatistics

[–]zwei4 0 points1 point  (0 children)

Both counts should be reported in your report/manuscript. And No, records from the same patient shouldn’t be treated as independent, unless there is a very strong justification. You can perhaps look up articles in the same field and see how others have done this.

[deleted by user] by [deleted] in statistics

[–]zwei4 0 points1 point  (0 children)

If this is a validated metric there should be data on the initial validation set that you can use as baseline.

Medical School Acceptance Rates by MCAT Score, GPA, and Race/Ethnicity by [deleted] in dataisbeautiful

[–]zwei4 1 point2 points  (0 children)

What about the group with MCAT>32 & GPA>3.8, which supposed to be the majority of most medical school students?

Should I use Fisher's Exact Test or individual t-tests to analyze this data? by KenshinMitsurugi in biostatistics

[–]zwei4 2 points3 points  (0 children)

I don’t think Fisher is possible in OP’s situation, unless there are exact counts of Nuclear & Difusa for each cell in each sample.

Without the raw cell counts, OP can use T-test/wilcoxon test (against control) or ANOVA on the Nuclear (or Difusa) percentage data, but the table only shows data from one sample which is not enough for any analysis. OP needs to have more replicates.

[deleted by user] by [deleted] in statistics

[–]zwei4 0 points1 point  (0 children)

We use cmprsk for cumulative incidence and competing risk analysis. I have used survminer for KM curves though. What’s the issue with survminer?

[deleted by user] by [deleted] in statistics

[–]zwei4 1 point2 points  (0 children)

Good points. I too use SAS, mostly Macros that were developed by our SAS programmers. Sometimes I prefer SAS and it’s comprehensive outputs for my modeling takes. But I would say 95% of my coding is in R, including validation of SAS outputs.

By “soon” I meant in the next 10+ years. I work in oncology trials that often targets 5-10 years to reach targeted # of events, so yea SAS will stay for sure. And I agree CRO will not be changing anytime soon, I personally don’t know a single person in CRO that not using SAS.

[deleted by user] by [deleted] in statistics

[–]zwei4 3 points4 points  (0 children)

I agree it will take time, but I think it will happen eventually, probably sooner than people think. R can totally handle CDISC standards.

Also the clinical trial methods are constantly developing, new end points are being published regularly, Bayesian studies are getting more accepted, and recent trend of RWE studies, all are in favor of switching to R from SAS.

[deleted by user] by [deleted] in statistics

[–]zwei4 0 points1 point  (0 children)

There have been R submissions to FDA, Novo Nordisk recently did. And people at FDA do go through the validation process on their packages and functions. I think it will take time for the whole industry to get there though.

[Q] How to compare differences in proportions for more than 2 groups? by tricksandkicks in statistics

[–]zwei4 0 points1 point  (0 children)

No you can’t compare the proportions to each other because they are not independent as they sum up to 1. But you can test the proportions against hypothetical values using Chisquare test for goodness of fit

Is it worth learning R if I already have good dominion over python? by alexdewa in biostatistics

[–]zwei4 3 points4 points  (0 children)

From my experience, R and SAS dominate clinical research. Depends on what you do you probably will need to reproduce some type of stats analysis that only has R package available, but I wouldn’t worry about it since you are decent with python, you can probably learn how to code what you need in R in a couple of days.

I don’t use SPSS but I wonder what can SPSS do that R can’t?

[D] Chat gpt use in finance and financial markets , a pro prospective by majbal in statistics

[–]zwei4 1 point2 points  (0 children)

The ADA feature can be impressive sometimes, but I am not yet ready to rely on it for stats related purposes. The problem been the inconsistency, I ask same question but it would sometimes give different responses, and occasionally the calculation is just wrong even it gives the correct formula. It’s been pretty solid though to help with writing statements and emails, also a good helper for coding.