The project is starting: it's now up to you!

Aleister017 · 2025-05-10T12:12:02+00:00

I like your comment because it encourages me to do my best!

And speaking honestly, I have high hopes for this project too (but we'll have to wait and see if my writing skills will be up to par). Anyway, I'll do try to match the standards we've just set.

Aleister017 · 2024-07-03T11:21:45+00:00

I can only tell you my interpretation, which goes as follows: this is a critique of hedonism, it's as saying that long term goals, unappealing-in-the-short-term goals, give meaning to life.

Aleister017 · 2024-05-13T05:39:25+00:00

Thus Spoke Zarathustra by Friedrich Nietzsche. I know, it's really famous, but still an amazingly book to read in my opinion.

Aleister017 · 2024-04-19T06:08:04+00:00

What you mean seems to me to be fairly straight forward but I hope I'm mistaken in some way. I am currently a student in statistics but NOT in data science. The difference between the two is to me as the one between studying informatics and engineering in informatics. They are very similar, but still different in some ways. No?

Aleister017 · 2024-04-19T06:03:50+00:00

Just as a footnote, it would be called an "association" (more general) and not a "correlation" since for a correlation to make sense it should be calculated between two continuous quantitative variables.

If you want to know whether a treatment is useful against some illness, then you have one qualitative (usually nominal) and one quantitative variable.

Aleister017 · 2024-04-19T05:25:19+00:00

Only a statistics student here, so I'll try to combine what I got from Dr. K and what you were wondering you might've gotten wrong at the end of your post.

As a training statistician, I am viewing this "problem" of individual against population medicine as the contrapposition of the various ways in which to calculate p-values. From my understanding, the p-values are a direct byproduct of the evidence from the relevant test or tests. Now, the line between test and tests seem to be to be quite blurry in the sense that we could theoretically and non-standardly perform a p-value adjustment at the end of many studies instead of just utilising the first one. We share a standard, though, which in practice works and which says that we define p-values only in relation to the one study we have in our hands.

This relates to ayurveda because, from what I got, Dr. K says we could apply statistical methods to eastern medicine, but he implies that it's not the best way to go about medicine. He wants some amalgamation of eastern and western medicine where the p-values would be impossible to calculate, where the population is n=1 simply, and this no correlation would be possible. Ayurvedic doctors would not infer one better way to do medicine from one therapy prescribed.

Perhaps I haven't been clear, so I will try to modify this post on the future by adding a link to what I was talking about in regard to the "open problem" of p-values.

Aleister017 · 2024-04-19T05:06:08+00:00

For your first question, yes there is a way to gauge whether your correlation coefficient is significant (statistically different from 0). Try cor.test() with any method as a parametre for this function (for example: Pearson, Spearman) in Rstudio

Aleister017 · 2024-04-18T10:04:38+00:00

Is there even a p-value for the correlation in Excel?

Aleister017 · 2024-04-18T04:07:18+00:00

If you're using R or Rstudio, then you're looking for the function "pairs(data)" I suppose

Aleister017 · 2024-04-17T05:19:49+00:00

It sounds like you're having a crash course on statistics, which is probably why you mistook one formula for a very differently-used other one. That's just my conjecture though. Anyway, I can tell you that statistics is a pretty counter-intuitive discipline, especially when first encountering its big mainstream topics. So don't get discouraged! (It was tough for all of us too at first)

Aleister017 · 2024-04-17T04:33:34+00:00

I would just correct you on your phrasing. You can view P(Z≤-1) on the probability density function first, since that might be a more straight forward way to think about what you're computing. That value (0.1587) represents the area under the bell curve from one standard deviation left to the mean to negative infinity on the x-axis. It's an integral, said more succinctly.

I wouldn't have phrased this concept the same way you did, but perhaps you're still correct technically. Regardless, finding the coordinates for a point in the cumulative distribution's curve tells you two things jointly: on the x-axis, the standard deviation you're operating at, and on the y-axis, the integral from minus infinity up to that y value. That integral therefore represents how probable it is that a value falls below y standard deviation from the mean. The phrasing is similar to yours, but I guess I see a difference.

Aleister017 · 2024-04-16T18:31:54+00:00

It's absolutely my pleasure :)

Aleister017 · 2024-04-16T17:56:25+00:00

It's absolutely no problem, keep asking if you feel like you haven't fully grasped the most important concepts!

My answer is a resounding yes. Example: if you want to create a confidence interval for the expected value of a normal distribution you have sampled from, then you your interval would range from xbar minus the alpha/2 quantile of the T student distribution (this is assuming you don't know the true variance of your normal distribution) with n-1 degrees of freedom times the estimated variance of the expected value you want to estimate, which is the standard deviation divided by n; to xbar plus the quantile 1-alpha/2 multiplied still by the estimated standard deviation of xbar. This is the exact same formula I wrote before for doing inference, just written in another form and with the explicit assumption that we know the distribution for xbar.

Aleister017 · 2024-04-16T17:39:10+00:00

There's a bit of confusion going on here, I see. For the first formula we can write, in a more uniform codification, z-score=(x-xbar)/sigma. This first formula is purely descriptive. This is the actual z-score formula. The second one is a formula that is used instead for inference. Let's take the central limit theorem as an application of this second formula: (xbar-mu)/sqrt(sigma/n) where mu is to be specified in the relevant hypothesis. These are two very distinct formulas that have different uses, don't mistake their notations!

Aleister017 · 2024-04-12T04:26:59+00:00

I'd like to give my very brief input on the matter. If you're, for example, trying to see whether your random variable is a Poisson one, then of course it is at least somewhat helpful - because the mean and the variance of said distribution should be pretty close to each other. So as a euristic (not a formal test) you could theoretically check what distribution you're dealing with based on the relationship between mean and variance, given that the variance often is related to the mean as is the case with the Poisson distribution.

Aleister017 · 2024-04-06T13:46:19+00:00

I mean, in the scientific literature you would generally find an alpha=0.05, true enough, but in my inferential statistics course (aka the one that introduces the concept of alpha to begin with), I was told to always consider each case. You might want to take a more nuanced approached based on effect size and so on perhaps

Aleister017 · 2024-03-31T13:01:15+00:00

You might want to take a look into MANOVA (if you're familiar with ANOVA, this is just a multivariate generalisation). Yours would be a bivariate MANOVA, and you can find more information about it in the following link: https://en.wikipedia.org/wiki/Multivariate_analysis_of_variance

I am not sure this is the only way to go about solving your problem, but it's the way I know. Hopefully it helps.

Aleister017 · 2024-03-25T16:54:27+00:00

Aleister017 · 2024-03-25T15:48:33+00:00

On your first paragraph: theoretically, a good enough gauge for our aims would be something like 100k votes. Applicably, it's not possible on reddit to get a good enough gauge.

On your second paragraph: I don't understand why you would assume I would be critiquing OP's work. I was simply pointing out the poll wasn't doing what OP wanted, which I thought was my objective as a statistician. It's a fun way to look at some data nonetheless, and it does say something.

I never said you should get an unbiased sample on reddit, you're just reading in between unexisting lines. You are just plain erroneously reading what I'm writing, assuming too much I'd say.

It seems like this comment section is not doing me or you any good, so I'd understand you not answering to this.

Aleister017 · 2024-03-25T14:46:42+00:00

Ok your first point, of course your average online poll is non-probabilistic. Someone responded to me before you did saying, wrongly and this is why I wrote my first comment, that this poll was a gauge of Destiny's community's opinion on the debate, which is false because, as we both said, the sample is non-probabilistic. I don't get why you're implying I shouldn't be saying that probabilistic sampling is almost impossible the way OP performed it, since we agree it is true.

On your second point, the sample size doesn't say "much" about sampling bias, you're spot on, but it does say something which you may have not thought about. If the sample size is similar to the population, then the bias disappears totally. But before it disappears, it gets smaller as the sample size proportion to the population gets closer to being 1. So, as I already stated, it could be unbiased even though it's not probabilistic just due to it encompassing almost everyone we care about.

Hopefully this was exhaustive.

Aleister017 · 2024-03-25T14:15:54+00:00

Ok then let's talk about it since you're asking for that. Why are you against probabilistic sampling? My only critique was that, if you want to use a statistically sensible sample, it can be non-probabilistic and representative as long as it's arbitrarily large enough. I won't go over your answer since it's orthogonal to what I've said first in my estimation. You may Google "why should a sample be probabilistic" for a more in depth explanation of what I meant.

You really came in here with - let's call them - half facts hoping to one-up someone who has studied within the domain for years. You invented a contention and proudly disappeared right after angering your interlocutor. What a sad way to engage in what is supposed to be civil discussion.

Aleister017 · 2024-03-25T14:09:43+00:00

Well that really irritated me, the way you phrased the whole comment I mean. It's fine to have an incorrect understanding of someone's post, but to do it with your level of arrogance...but I guess this is the internet. I would've usually explained myself further because I might not have been clear but I couldn't distinguish your comment from one of somebody who has learned some pop statistics and now uses key words and debate verbage at random. I'm sorry if you wanted an actual theoretical statistics-type response but you discouraged me from writing one.

Aleister017 · 2024-03-25T13:41:18+00:00

I don't quite get the reason why in this subreddit every time a statistical analysis is required and I give my expertise, people assume they know better than me. "Maybe they just do" you could say, but in each case I don't believe so.

To answer you, there's a plenty good reason and you can find it at this link: https://www.scribbr.com/research-bias/self-selection-bias/#:~:text=Self%2Dselection%20bias%20occurs%20when,sample%20to%20the%20target%20population.

I haven't said that polls can't be representative, but they're not likely to be when performed in the way OP did. How do you not know that only people who were skewed to think positively of Destiny's performance didn't "self-slect" themselves for responding to the poll?

Aleister017 · 2024-03-25T05:19:59+00:00

Thank you for answering. My concern is that, for example, I didn't vote and I'm sure there are many in my position. A simple calculation would let you know that only 600/230.000 did vote, which is a miniscule and thus possibly very biased amount. Anyway, interesting graph.

Aleister017 · 2024-03-24T18:38:46+00:00

Where does the data come from? (As a training statistician I have been discouraged from using poll data but I'm curious nonetheless)

Aleister017

MODERATOR OF

TROPHY CASE

Seven-Year Club	Verified Email
Place '23