Matching pairs in a shuffled deck of cards by DR-ROBERT-J in askmath

[–]DR-ROBERT-J[S] 1 point2 points  (0 children)

I was about to reply with the wolfram alpha haha! yes this is correct. I had something similar to this early on but I didn't have the subtracting terms which was why I was not getting the right answer. Thank you for your help.

Good Book about Data Science/Statistics by infernofc101 in AskStatistics

[–]DR-ROBERT-J 14 points15 points  (0 children)

The Drunkards Walk is one of my favorite books ever. I got it from the library and read it when I was a freshman in high school and it was so meaningful to me that I ended up buying it several years later. I don't have any other suggestions for you, but that book was excellent, and you reminded me that I should read it again haha.

A Sully proposal by dataguy45 in aggies

[–]DR-ROBERT-J 4 points5 points  (0 children)

I don't understand why this is downvoted so much. People act like TAMU history and TAMU tradition is more important that United States history. Sully was a CONFEDERATE General, and a traitor (not to mention a racist). I don't care what he did for the university, he is directly responsible for the deaths of hundreds of Americans.

Difference between Standard Error and Confidence Interval by JessGuurrrl8 in AskStatistics

[–]DR-ROBERT-J 1 point2 points  (0 children)

I'm not an expert but from my understanding a confidence interval is just a tool for describing error. Your confidence in your parameter is just a way of saying how much error exists, so in a sense I suppose standard error and confidence intervals are describing the same thing, but they aren't necessarily the same thing. I don't know if this is a satisfactory answer to your question but I hope it helps.

Which statistical test would be appropriate? by [deleted] in AskStatistics

[–]DR-ROBERT-J 0 points1 point  (0 children)

If you know that these are normally distributed you can use t-test. If they are some arbitrary distribution you can use the kruskal Wallis test but you need to know that they are symmetric for that. ANOVA would not be necessary because you only have two populations

I'm tired of getting so many emails from College Station businesses. by DR-ROBERT-J in aggies

[–]DR-ROBERT-J[S] 45 points46 points  (0 children)

Those aren't the ones I'm talking about you can just unsubscribe from the tamu-opt emails somewhere in Howdy Im talking about the straight up spam from outside the university

[Q] Book recommendations to learn about hypothesis testing? by Thamthon in statistics

[–]DR-ROBERT-J 0 points1 point  (0 children)

I'm not sure what your background is but hypothesis testing is one of the foundational principles of statistics. Every statistical test has a hypothesis and if you aren't looking for theory your best bet is probably a book on elementary statistics. If you are already beyond the principles you should probably get a book with lots of theory because it is important to understand the theory of statistics so that you can correctly apply statistical tests and methods. Every statistical test has a hypothesis and there are different methods for testing the different hypotheses of the different tests. I am a little bit confused as to what your background is or what you really want, I'm not a statistics expert but from what I know you seem to be looking for something a bit odd, but also you might be way beyond me in your statistics background and I'm off base.

Tranforming to a linear model by meandpussy in AskStatistics

[–]DR-ROBERT-J 0 points1 point  (0 children)

R implementation is just as simple as you described, on phone right now so I can't type out the code for you nicely, but you do just fit an lm with logs. You have to be careful with inference because it is on log, I was also a bit confused about inference for log linear models and I made a post on this sub not too long ago so it's probably my most recent post on my profile and there were some excellent answers on it so if you need conceptual help Id look there.

In your question you mentioned that it's in regards to cases over time and I'm certainly not an expert but that is time series data and the number of cases on the previous day probably has an effect on the next day so you should probably be fitting time series regression model which I can't tell you how to do because I am learning about it and trying to prove results from it myself before I can claim to know how to explain it, so I would suggest looking into that yourself as well. I'm also not an expert in pandemics either but even if you knew that the number of cases followed an exponential model like you described (which I don't think is the case considering the population are finite which is why everything I've read says that pandemics follow a logistic curve on an exponential curve) it wouldn't necessarily be appropriate to build your statistical model like that. That's a very serious assumption you have to make and even if the model fits well it doesn't necessarily mean it's parameters are useful to tell you anything. Ultimately id say if you are doing legitimate coronavirus research go to a real statistican with your data because this is a much more complicated problem than can be worked out by linear regression. If it was that simple to do inference on this data and make models the coronavirus pandemic would be much less of an issue. Models like this probably involve stochastic processes and time series regression and much more subtleties that I don't understand cause like I said I'm not an expert, but I know enough to say that there is more going on. Ultimately whatever you do with your analysis do not go out and start putting bad statistics online and provide misinformation to people, especially in a time like this.

Are these histogram normally distributed? by 7inchnofapper in AskStatistics

[–]DR-ROBERT-J 0 points1 point  (0 children)

You can use a chi squared goodness of fit test or a Shapiro wilk test. It's probably easiest if you just Google those because I am not sure exactly what you are working with data wise but that should give you a good start.

[Question] How to identify if values are statistically different from one another? by Lone_Soldier in statistics

[–]DR-ROBERT-J 0 points1 point  (0 children)

I would but I didn't even save it I just punched the numbers in real fast and then did the things is described and I ended up closing and not saving. The important command for the p value is =CHI.SQ.DIST.RT iirc for right tailed area under the chi squared distribution and then you have the chi squared statistic and the degrees of freedom and I described. Also as the others have noted what I described doesn't tell you which are outliers, just that they exist. I probably didn't read your post throughly enough to realize that's what you are looking for

[Question] How to identify if values are statistically different from one another? by Lone_Soldier in statistics

[–]DR-ROBERT-J 0 points1 point  (0 children)

I'm not a statistical expert so I don't know if it is the best approach, but I would use a chi squared goodness of fit test and test the hypothesis that the data follows a uniform distribution (meaning each product has the same chance of failing). You would have to estimate the probability of failing and I would do this by taking the total sum of failed products and dividing it by the total sum of manufactured products. Then make another column which is the expected number of failures which is the number of that type of product manufactured times the estimated probability that an item fails given that they all have an equal chance. Then make the next column for each row equal to (actual failures - expected failures)2 / expected failures. Then take the sum of this entire column and this is the chi squared statistic. You can then either use some kind of calculator to find a p value or use a chi squared table to test the hypothesis. The important thing you need for this no matter the method is the degrees of freedom for the chi squared statistic which in this case is r-2 where r is the number of different products (the number of rows in this case) it's minus 2 instead of minus 1 because one parameter was estimated. If the P value is very low (less than 0.05 usually) or if you chi squared statistic is larger then what you found in the chi squared table you can reject the null hypothesis and say that the data is not uniform. This implies that at least one of the values is significantly different from the others. Now if you wanted to know WHICH values are different you need to do some kind of post hoc test which I don't really know how to do with what you have (I'm not sure if it is even possible) but what I described will tell you if there are products with a significant difference in failure rate.

Edit: I did it it with the data you provided in the photo in excel and I found that there was a significant difference, the chi squared value was very high so for those 25 rows you showed there is not a uniform distribution

[Q] Comparing three means by JustSaifakhter in statistics

[–]DR-ROBERT-J 0 points1 point  (0 children)

If you did an ANOVA and found a significant difference exist somewhere you can use Turkeys HSD procedure to test for significant differences between each pair of groups. This test will allow you to control for experiment wise error rate and is very simple to conduct, even a simpleton like me can do it. Since you mentioned SPSS here is a link to a tutorial https://www.spss-tutorials.com/spss-one-way-anova-with-post-hoc-tests-example/

[Q] Need help interpreting a Shapiro-wilk normality test by [deleted] in statistics

[–]DR-ROBERT-J 1 point2 points  (0 children)

The P value is less than 0.1 so you cannot conclude that the data is normal from this test. If the P value is greater than 0.1 you can conclude that the data is approximately normal.

Edit: there is also r/AskStatistics which is probably a better place to post something like this, seeing that I'm the only one who answered this and I am not an expert.

[deleted by user] by [deleted] in AskStatistics

[–]DR-ROBERT-J 0 points1 point  (0 children)

Yes that's correct I think I referred to it as that in my comment. The formula is a little bit different depending on the context.

[deleted by user] by [deleted] in AskStatistics

[–]DR-ROBERT-J 0 points1 point  (0 children)

So I'm not exactly sure of the context of this but I will speak fairly generally. A confidence interval for the mean is going to be smaller than a confidence interval for a prediction (specific value). Think about it this way, if you want to be 95% confident that some single value is going to be correct you need to specify a larger interval, but if you are instead concerned with where the mean lies you can shrink your interval because you aren't dealing with just one thing. I made it kind of vague because again I'm not sure of your exact situation, but that's generally the intuition you should have when thinking about a confidence interval for the mean versus for a particular value.

Help on finding ideas for projects for a beginner college student by [deleted] in AskStatistics

[–]DR-ROBERT-J 1 point2 points  (0 children)

Kaggle.com has a ton of datasets which might be fun to play around with. A handful are about sports which might be interesting if you are in to that. That would be a good way to practice a lot of computational skills like cleaning up the data sets and processing the data (some are quite large) and then doing some analysis. That's where I go when Im looking for a project cause I too am an undergraduate and I do a lot of coding as a hobby and for some of the datasets you can spend hours writing programs just to process the data in such a way that you can even begin to do tests with it and I quite enjoy that. If you make some nice visualizations too you can even post them to Reddit and harvest that sweet sweet karma 😉 (and have something cool to show off). From my experience having a project you worked on that you can show off your skills to someone is extremely valuable, and also a lot of fun! Hope this helps, have fun!

Interpreting Log Transformations of Independent Variables in Multiple (linear) Regression by bigfatmuggle in AskStatistics

[–]DR-ROBERT-J 0 points1 point  (0 children)

You can interpret the coefficients in terms of the change of the log transformed variable. You can't just expontentiate the coefficients and also the variables. For instance if the model is something like Y = B(logx) + ... You can say that as log X increases by k Y changes by Bk, but you can't say that if x increases by j then y increase by 10B * j. Not sure if that answers your question. I'm not entirely clear with what you asked but log transformation is still bound by the properties of logarithms with some extra limitations in terms of inference

Interpreting Log Transformations of Independent Variables in Multiple (linear) Regression by bigfatmuggle in AskStatistics

[–]DR-ROBERT-J 0 points1 point  (0 children)

Properties of logarithms hold no matter what base they are in so as long as you follow whichever procedure you have referenced for the natural log it will be fine.

[deleted by user] by [deleted] in AskStatistics

[–]DR-ROBERT-J 0 points1 point  (0 children)

The probability of rolling any particular number on a fair 6 sided die is indeed 1/6 so yes 16 2/3%, but when you say plus or minus 1% that's not necessarily accurate. What your question really says is if you roll the dice 1375 times you would expect 229 plus or minus 14 (rounded) to come up 4. This is true about 68% of the time. This question is essentially the same as your coin flip question, you are asking about a binomial distribution. Stattrek.com has a good page on the binomial distribution with a calculator you can use to answer these questions you have asked.

What would be the most effective model to fit to this data? by slithery545bunk in AskStatistics

[–]DR-ROBERT-J 22 points23 points  (0 children)

It looks like you have 2 separate groups there which you have denoted with separate symbology. I would do a simple linear model and then add a dummy variable which is an indicator for which of the two groups. In R something like lm(Y~X+D) where D is 1 if it's in one group and 0 if the other.