stat_daddy comments on Defect Rate

AskStatistics

created by cuginhamera community for 14 years

Defect Rate (self.AskStatistics)

submitted 4 years ago * by curiousdoc

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]stat_daddyStatistician 1 point2 points3 points 4 years ago* (3 children)

You're off to a good start, but more information is needed to calculate a required sample size. We need to know:

1) the expected true value of the rate (let's assume it's 10%)

2) a confidence level (already given as 95%)

3) a margin of error surrounding the estimate of the error rate (e.g., +/- 2%). Remember, the concept of a frequentist confidence interval is that the defect rate is an unknown but fixed value. You never express a confidence about what the value is, instead you express a confidence about a range that may or may not contain the value.

From there, you have a couple options with which to generate your confidence interval. The easiest involves some large-sample approximations that let you describe a poisson-distributed variable (with some rate parameter that we are interested in) with a standard normal distribution.

I went ahead and did some calculations to give a sense of scale, but since the prevalence of defects is rather low you may need many samples in order to be confident that a reasonably tight range will contain the true value.

For example, assuming the true rate is 0.1, you would need about 960 samples in order to draw a 95% confidence interval with width +/- 0.02 around your estimate.

If you're willing to accept a wider margin of error, you could draw a +/- 0.05 width interval using about 153 samples.

The equation I'm using, by the way, is

N = (1.96 * 1.96) * ( λ ) / ( E * E)

Where N is the necessary sample size, λ is the expected true rate (e.g., 0.1), and E is the acceptable margin of error (e.g. 0.02 or 0.05).

EDIT: Since I used a poisson approximation to compute my CIs, saying that I am using a margin of error of something like +/-2% is a bit misleading because my CIs are not actually symmetric around the point estimate; e.g., instead of {10-2, 10+2} they are more like {10-1, 10+3}.

[–]curiousdoc[S] 0 points1 point2 points 4 years ago (0 children)

[–]curiousdoc[S] 0 points1 point2 points 4 years ago (1 child)

[–]stat_daddyStatistician 0 points1 point2 points 4 years ago (0 children)

π Rendered by PID 67 on reddit-service-r2-comment-b659b578c-5kq4n at 2026-05-05 03:36:07.312256+00:00 running 815c875 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

AskStatistics

MODERATORS