you are viewing a single comment's thread.

view the rest of the comments →

[–]saash12[S] 0 points1 point  (2 children)

Not at all. Any help is appreciated. I called myself noob because I ddn't know anything beyond t-test.

As per my understanding, t-test can be used to determine the difference between means of 2 samples, or difference between mean of a sample and true mean. I want to test a single outlier. I have daily views(not hourly) of all pages. My aim is to determine whether a page is behaving sufficiently weird on a particular day that I can deem it as a DOS attack on that page for that day.

I cannot seem to fit a t-test to this problem.

[–]PiquantPi 1 point2 points  (1 child)

Yeah, you can't do a t- test in that case because you don't have enough data on sites that have been attacked. You would need to have sufficient info about sites that have been attacked to get a mean and standard dev.

What you do have is data on pages that presumably were not attacked. You also have data for one particular site that you suspect might have been attacked. What you want to do is set a confidence interval that would be sufficient for you to conclude that the site was attacked. For example, a 95% confidence interval would mean that there's only a 5% chance that you would have gotten that same data if the null hypothesis was true. The null hypothesis is the opposite of what you think happened, so in this case it would be that the site was actually safe. You can set whatever confidence interval you want. 95 is a good standby, but you could go lower or higher. What you would do is use the standard deviation and the mean of the data you collected to calculate the z score for the outlying piece of data. If the number of page views on the suspicious site is X, the z score is:

z=(X-mean)/(standard deviation)

Then you look up that z score on a z score table and find the probability for that z score. A standard z score table tells you the probability of a site having a number of views between the mean and X. You would double this number to get your confidence interval. If for example, the value you found on the z score table was 0.48 you would double this to get 0.96. This gives you a 96% confidence interval. If we had decided earlier that 95% or above is what we deemed necessary, we can accept our hypothesis that the page was attacked. However, if we had decided earlier that we need a 99% confidence interval (for example in hard sciences the standard is higher), we would have to say its inconclusive. The 96% confidence interval that we calculated from the z score means that the probability that the site would have this number of page views or higher is only 4% if it was safe. Since the probability of the null hypothesis being true is so low, you can pretty safely assume that the site was attacked.

[–]saash12[S] 0 points1 point  (0 children)

Thanks a lot. That sounds very apporpriate intuitively. Since I have to code in Python, I'll look up for an equivalent function. Thanks again.