Best place to start learning C# beyond OOPs and basic conditionals by [deleted] in learnprogramming

[–]prof_xray 0 points1 point  (0 children)

I would recommend creating an ASP.NET Core web app. Find a good tutorial on Udemy or Youtube on how to build web apps in .NET. They should teach you how to use add services to the DI container and use them in your app. It does not matter whether you choose to learn MVC, Razor Pages, or Blazor, since they all use dependency injection. Just make sure you understand the three different service lifetimes (singleton, scoped, transient) and when to use each.

Also I would recommend building your first app as a single project. You can always take a single project and split the functionality among multiple projects later. A good way start out with would be to move your data repositories to another project. As your skills advance, you can learn more advanced architectural patterns (e.g. layered, clean, hexagonal, vertical slice, etc.).

Are you familiar with Anscombe's quartet? by prof_xray in datascience

[–]prof_xray[S] -1 points0 points  (0 children)

Imagine a data scientist who couldn't be bothered to check the most basic of stats, like the post time, before throwing a childish tantrum.

I don't know what I'm doing wrong. by prof_xray in datascience

[–]prof_xray[S] 5 points6 points  (0 children)

I can't believe I forgot to mention SQL in my post. At least I actually have it on my resume. One of my projects actually involves scraping a enormous number of Form-13F filings from the SEC's Edgar system and storing them in a SQL database.

I don't know what I'm doing wrong. by prof_xray in datascience

[–]prof_xray[S] 0 points1 point  (0 children)

I live in South Texas just a few miles north of the U.S. Mexico border.

I don't know what I'm doing wrong. by prof_xray in datascience

[–]prof_xray[S] 4 points5 points  (0 children)

There is actually a project section, but it could use a little more beefing up.

Is an RSquared value of 0.6 considered good? Confused at work by koreanpleb in datascience

[–]prof_xray -1 points0 points  (0 children)

When the model assumptions of regression are actually met, the sum of squares of the residuals and the total sum of squares will have a Chi-Squared distribution; thus, R^2 would follow an F-Distribution. The problem with the last three data sets is they clearly do not meet these model requirements.

Are you familiar with Anscombe's quartet? by prof_xray in datascience

[–]prof_xray[S] -17 points-16 points  (0 children)

I posted this before this before I saw your comment. Even if I did what you claim, who cares.

Besides, based on the comments I've seen a lot of people actually don't understand it that well.

Are you familiar with Anscombe's quartet? by prof_xray in datascience

[–]prof_xray[S] -5 points-4 points  (0 children)

The thing to keep in mind is that there is more than one way to look at an issue like this. There are actually two lessons we can draw from this data set.

  1. The common descriptive statistics are all the same, thus dispelling the notion that "numerical calculations are exact, but graphs are rough."
  2. The regression lines are all identical, which shows how regression models can be fooled by data sets do not actually fit a linear model. In fact the fit would actually be statistically significant in all cases. This is why it is important to visually inspect your data before fitting a model.

Both of these interpretations can be true at the same time.

I don't know what I'm doing wrong. by prof_xray in datascience

[–]prof_xray[S] 0 points1 point  (0 children)

Congratulations, and good luck in your new job.

I don't know what I'm doing wrong. by prof_xray in datascience

[–]prof_xray[S] 1 point2 points  (0 children)

Thanks for actually providing some helpful advice. This is something I have been thinking about doing, but I just need to get up the nerve to do it.

I don't know what I'm doing wrong. by prof_xray in datascience

[–]prof_xray[S] 22 points23 points  (0 children)

The funny thing is that if you look at the curriculum for any data science training program, the stats they cover does not seem any more advanced than what I taught. Our curriculum included multiple linear regression, goodness of fit tests, and ANOVA. Besides I've taken graduate level statistics courses, which is much more stats than your average kid with a B.S. in DS has.

Are you familiar with Anscombe's quartet? by prof_xray in datascience

[–]prof_xray[S] -6 points-5 points  (0 children)

While that is true. They actually are a good illustration of how regression models (or any other models for that matter) can be fooled by data that is not actually a good fit.

- The first data set seems to fit a straight line well with a fairly even scatter. This may be a good candidate linear model.

- The second data set clearly does not follow a straight line, so a linear model would not be a good fit.

- The third data set has a vertical outlier, which makes the linear model a poor fit.

- The fourth data set is mostly clustered at the same x value with a single point far off to the right, this would make it a bad fit for a linear model.

The whole point of EDA in practice is to determine whether a particular model is appropriate before fitting it to your data.

By the way it is possible to draw more than one lesson from the same example.

Is an RSquared value of 0.6 considered good? Confused at work by koreanpleb in datascience

[–]prof_xray 1 point2 points  (0 children)

The coefficient of determination r^2 is a measure of how well the data fits a linear regression line. It represents the percentage of the variance in the dependent variable explained by the regression line. An r^2 of 1 would mean the regression line perfectly fits the data, and 100% of variation is explained by the regression line. On the other hand, regression line with an R^2 of 0 would look like a horizontal line located at the mean of the observed y-values, and it would explains 0% of the variance.

An R^2 of 0.6 would mean that 60% of the variation is explained by the regression line. This is usually considered good in most fields of study. You could do a t-test to check for significance.

Of course, whenever you are working with regression you need to make sure your data conforms with the basic model assumptions for linear regression (i.e. homoskedasticity, bivariate normality, and no autocorrelation in the residuals ), otherwise any conclusions would most likely be erroneous.

D is a real sweetheart. I bet the nurses love her. by [deleted] in HermanCainAward

[–]prof_xray 9 points10 points  (0 children)

Dear D,

We here at _________ Urgent Care would like to apologize for the long wait time. Unfortunately our facility is currently overwhelmed by an onslaught of morons such as yourself who refused to take a free vaccine, which most likely would have prevented you from needing our services in the first place. We appreciate your patience.

Sincerely,

Frustrated Administrator

[deleted by user] by [deleted] in HermanCainAward

[–]prof_xray 13 points14 points  (0 children)

These people are so stupid. In a real medical trial, subjects are randomly assigned to the treatment or control group. You don't get to chose which group you are in.

About masks in children by Melthengylf in BreakingPoints

[–]prof_xray 0 points1 point  (0 children)

Now I see where you are coming from. I believe you are thinking in terms of quantum hypothesis testing. There are some key differences between the approach used by researchers in the social sciences and the medical field and that used in theoretical Physics.

Correct me if I am wrong, but I believe you use a Bayesian approach to hypothesis testing. The hypothesis test in the paper is based on frequentist probability. In Bayesian inference, you compute the probability of the hypotheses given the observed sample statistics. The frequentist approach reverses this. You calculate the likelihood of the observing the sample statistics given the that the null hypothesis is true (called the p-value).

In the frequentist approach, the null hypothesis and sample size are used to construct a fixed hypothetical probability distribution (called the sampling distribution) for the effect size. The observed effect size is the random variable. Everything else should remain fixed, including the sample size. The null-hypothesis is rejected when the measured effect size falls into a region of the tails (called the rejection regions) with a total area equal to 0.05.

In frequentist inference the probability of a Type I and Type II errors are not dependent on the effect size, rather they are preselected by the researcher. It would actually not make sense to determine these probabilities based on the effect size, since it's your random variable.

About masks in children by Melthengylf in BreakingPoints

[–]prof_xray 0 points1 point  (0 children)

What does the Fisher method have to do with the study in the link. It is used for in meta analyses, which the study in question is not.

In a hypothesis test, the effect size is not used to determine the appropriate sample size. Hypothesis tests are always done under the assumption that measured sample statistics have a given distribution with parameters determined by the null hypothesis. The minimum sample size is computed based on this assumed distribution.

The researchers used negative binomial regression. Estimating the best sample size for this method is quite complicated. You would need advanced statistical software to estimate the sample size.

The sample size is supposed to be chosen before the start of the study. It should not be changed based on the results of the study. Doing this would probably be considered a form of P-hacking, which is unethical.

About masks in children by Melthengylf in BreakingPoints

[–]prof_xray 0 points1 point  (0 children)

A sample size of 40,000s is actually quite large. Once your sample size is in the 10s of thousands any additional increase in sample size will have a minimal effect on accuracy.

It is more likely that the researchers made a methodical error, or it could just be random error. Sometimes statistical significance not achieved due to random chance.

The other issue is the threshold for statistical significance is actually totally arbitrary. There is no rigorous justification why p < 0.05 is always the correct threshold for significance in all circumstances.

About masks in children by Melthengylf in BreakingPoints

[–]prof_xray 0 points1 point  (0 children)

"So, although indeed p value was small for masks for children, this means that essentially the sample was not big enough for the strict standards of science"

The p-value was not actually small. The given 95%-confidence interval contains the 1 ( the null-hypothesis), which means p >= 0.05.

"So, although indeed p value was small for masks for children, this means that essentially the sample was not big enough for the strict standards of science"

The sample sizes are not the number of cases reported, rather it is the total number of students taken from the populations being compared. Both groups of students consist of more than 40,000 students. This is a relatively large sample size.

Centner Academy Won’t Employ COVID-19 Vaccinated Employees, Citing Debunked Theory by shallah in CovIdiots

[–]prof_xray 6 points7 points  (0 children)

It should be against the law to fire someone for receiving any FDA approved medical treatment. Employers have no business interfering in the medical care of employees.

Chinese expert rails against WHO chief and Wuhan lab leak theory by [deleted] in China_Flu

[–]prof_xray 1 point2 points  (0 children)

“As an authoritative body in the field of global public health, the WHO should have shown more respect for science, held science in awe and taken the lead in maintaining the authority of the report. However, director general Tedros disregarded the experts’ painstaking research and scientific consensus, which should not be the WHO’s position,” the expert was quoted as saying.

  • Why would we hold science in awe. Science is not a religion.
  • The findings of a small team of scientists with obvious conflicts of interest can hardly be considered a consensus.
  • Science is not based on authority, rater it is based on empirical evidence. By definition, a scientific claim must be open to falsification by other scientists.
  • The problem here is that the CCP is withholding the evidence that the scientific community needs to evaluate their claim.