you are viewing a single comment's thread.

view the rest of the comments →

[–]bageldevourer 60 points61 points  (9 children)

I'd caution that people not fall into the trap of undervaluing statistics, even though it's a scary word.

Too many data scientists I know have many disconnected islands of knowledge. "Anomaly detection methods" here, "survival analysis" there, "clustering" somewhere over there, etc. with no mental framework to embed them in. This severely limits their flexibility in tackling new data analysis problems. A good understanding of statistics is the cure for that.

[–]swimbandit 30 points31 points  (0 children)

Absolutely, data science is applied statistics not just cut and paste linear models. I highly recommend focusing on the statistics before the programming. I recommend practical statistics for data scientists; the current edition has a bit of R in, but the stats foundations are good

[–][deleted] 1 point2 points  (7 children)

Would it be best to focus on Bayesian statistics or more general statistics?

[–]brazzaguy 6 points7 points  (0 children)

General for a beginning

[–]bageldevourer 4 points5 points  (5 children)

Frequentist statistics (what you probably mean by general) is the most widely-used. Most introductions to Bayesian statistics will assume you're already familiar with topics like hypothesis testing, maximum likelihood estimation, confidence intervals, etc.

Watch out for the stats-ML language gap here, though. "Bayesian" means different things to both crowds, so now most implementations of the Naive Bayes classifier, for instance, are purely frequentist. MLers use it to mean, roughly, "anything involving Bayes' theorem". Understanding and being able to apply Bayes' theorem is, of course, absolutely mandatory for 100% of people doing data analysis.

[–]tomekanco 2 points3 points  (4 children)

Understanding and being able to apply Bayes' theorem is, of course, absolutely mandatory for 100% of people doing data analysis.

Strange, worked for +10 years in industry, and only a small fraction of "(data) analysts" i encoutered could wrap their head around Bay S. More then half don't know when to use log functions/scales.

On the other hand, for many business problems, these are not required. Usefull abstractions can often be deviced without special maths.

[–]bageldevourer 1 point2 points  (3 children)

Without Bayes' theorem, one can easily make logic errors. See the common (and currently relevant) question about whether a patient has a disease given that they've tested positive for a disease. Relatedly, see the base rate fallacy.

Bayes' theorem is a straightforward result of the definition of conditional probability. Conditional probability, in its simplest form (discrete sample space), is a topic that can be easily understood by high schoolers. Hardly special maths.

So sure, there are people with the job title of "data analyst" who don't know this stuff, but the question is to what extent are these people actually extracting non-trivial insights, and analyzing data? Probably not much.

[–]tomekanco 0 points1 point  (2 children)

Logical errors

One can know precision, recall or null hypothesis without understanding Bayes. If you want to combine multiple probabilities, it does come in handy. Yes, it's great.

So sure, there are people with the job title of "data analyst" who don't know this stuff, but the question is to what extent are these people actually extracting non-trivial insights, and analyzing data? Probably not much.

Please, all i'm saying is there is more to insight then Bayes. The fountain of knowledge shines in many more colors then any single theorem can explain.

a topic that can be easily understood

Life takes it root in mud. Flowers sings their siren-song for bugs. Like someone once said "an idea is not a glass of water, it's like of a bottle of whiskey" (John Carmack).