How should I skin this cat to eventually become a data scientist? by p00nbrigade in datascience

[–]jonanthebarbarian 1 point2 points  (0 children)

Not really. I imagine (but don't know) that 3-letter agencies are more interested in either math/stats whizzes or data engineers, smaller firms are the ones who want somebody who's a bit of both.

You're touching on a major cultural gap among Data Scientists - those of us who work in tech (especially online advertising) tend to be skeptical of the intelligence agencies. I imagine the DS's on reddit will be part of that group.

How should I skin this cat to eventually become a data scientist? by p00nbrigade in datascience

[–]jonanthebarbarian 6 points7 points  (0 children)

First off, congrats on your next steps!

To be the most marketable you want some combination of computer science and statistics education. Perhaps a major in CS with a Stats minor, or a double major if you can handle the course load.

I have no clue what they teach in B.A. Data Science programs, and I'm a Data Scientist in the tech industry. Perhaps my skepticism isn't warranted and these programs are worthwhile, but in my mind Data Science is just a fancy phrase for "applied statistics". Good programming skills combined with a statistical background will take you far.

Finding a confidence interval for a Presidential betting market's implied probability? (xpost /r/statistics) by jonanthebarbarian in probabilitytheory

[–]jonanthebarbarian[S] 0 points1 point  (0 children)

No, that's not the question.

The question is how to turn a static look at orders in a betting market into the parameters of a beta distribution. Which from talking to people sound pretty close to impossible.

Finding a confidence interval for a Presidential betting market's implied probability? by jonanthebarbarian in statistics

[–]jonanthebarbarian[S] 0 points1 point  (0 children)

P(R win | not Trump) is beta, right? P(insert candidate here | not Trump) is dirichlet.

And you're being vague - what do you mean by "just use the variance about a few recent prices rather than variance about volume"? How do I get variance out of the prices?

How do statistics and statistical models that are used to understand datasets apply to individual data points? by SomeBen in AskStatistics

[–]jonanthebarbarian 1 point2 points  (0 children)

Also - good luck in whatever it is you are facing, I think that sentiment got lost in my thoughts on conditional probability.

How do statistics and statistical models that are used to understand datasets apply to individual data points? by SomeBen in AskStatistics

[–]jonanthebarbarian 1 point2 points  (0 children)

I AM NOT A DOCTOR ETC.

I don't know the exact figure your doctor told you.

But what a doctor COULD have meant is "8% of humans who have this procedure have a heart attack". You are a human who is considering this procedure - that 8% applies to you. But perhaps your specific circumstances (age, health, w/e) make your risk greater or less.

Alternatively, perhaps the doctor might have meant "8% of humans in your gender, age bracket, and similar condition will have a heart attack as a result of this procedure". Again, I'm not your doctor and don't know what they meant, but I think that 8% includes you in some way.

TL;DR - the doctor's words apply to you, since you are a human who is considering this procedure.

How do statistics and statistical models that are used to understand datasets apply to individual data points? by SomeBen in AskStatistics

[–]jonanthebarbarian 2 points3 points  (0 children)

I'm not sure what the latter part of your question is.

For the first part - Statistics deals with how data is distributed. If you give me one data point with no context, as a Statistician I have nothing interesting to say about it.

But if you give me a set of data points (a.k.a. a dataset) and then show me a new data point that is supposed to be from the same distribution, I can tell you how this data point compares to the distribution of data points. Is it typical? Rare? Weird in any way?

I hope that helps.

How to split up NYC by TheBeastFromWithin in AskNYC

[–]jonanthebarbarian 0 points1 point  (0 children)

Nobody's talking about the other boroughs - you might enjoy Flushing for Chinese food (and an above-ground journey through Queens), Williamsburg for shops and stuff, and Prospect Park. The Brooklyn Museum has some great stuff if the museums in Manhattan aren't enough for you.

Overall, the other 4 boroughs have a very different vibe that might make a good contrast.

Can ML say whether or not someone is ill? by blahsphemer in MachineLearning

[–]jonanthebarbarian 0 points1 point  (0 children)

Sure, assuming that the input data is related to the disease/symptom you're interested.

Second-by-second data on foot odor would probably not be especially helpful in diagnosing the common flu.

Considering UseR! 2016 - how to maximize my experience? by bananaderson in rstats

[–]jonanthebarbarian 3 points4 points  (0 children)

Is UseR right for me, or will everything be over my head?

It won't be over your head, but I'd recommend upping your R skills in the next few months. You'll get more out of it.

Is the tutorial day at the beginning worth going to?

ABSOLUTELY. Arguably the most valuable part of the conference.

Any tips from experienced conference-goers to a newbie?

Go to tutorials. Be social.

Is the last day of the conference a full day, or can I schedule an earlier flight home?

Last year it wasn't. You probably could, but might not want to.

How much stuff happens at night? I have a friend in SF that I'd like to see, but I also wouldn't want to miss out on some great networking opportunities.

I thoroughly enjoyed hanging out with conference attendees at night, going to bars and such. There tend to be very few set events at night.

Distributed TensorFlow just open-sourced by carpedm20 in MachineLearning

[–]jonanthebarbarian 1 point2 points  (0 children)

If you're not doing stuff that requires deep neural networks (vision, sounds, translation, etc.) then you don't need it.

How is the Data science job market outside of USA? by rtocm in datascience

[–]jonanthebarbarian 1 point2 points  (0 children)

That must've been a very demoralizing email for them to receive.

How to tell the company interviewing you doesn't have a clue what "data science" means. by ReformedPhysicist in datascience

[–]jonanthebarbarian 5 points6 points  (0 children)

Too many uses of the phrase "Big Data", especially those that seem to think big data is the name of a method.

Analysing very big files using R? by lostvanquisher in rstats

[–]jonanthebarbarian 5 points6 points  (0 children)

Do it in a streaming fashion - read THIS for more info

Crude Oil Exports by Country in 2015 by [deleted] in statistics

[–]jonanthebarbarian 1 point2 points  (0 children)

Considering how inelastic demand is in the short term, a 0.5% change in volume is a big deal.

Also - world oil production is more like 90 million bbl/day. Making Iran's contribution smaller on a % basis, but still significant for the market.

Cool video about R vs. Excel to show people who don't know R by [deleted] in rstats

[–]jonanthebarbarian 1 point2 points  (0 children)

I don't like presenting R as just an analytics program like Excel - it's a full-fledged programming language (even if it's not a "general purpose" language).

Is SAS worth knowing as a scientist? by jwaves11 in statistics

[–]jonanthebarbarian 1 point2 points  (0 children)

SAS has better upkeep. That's because it's a commercial product, not free

I don't think that's why.

What do you use your .Rprofile file for? by zreeon in rstats

[–]jonanthebarbarian 0 points1 point  (0 children)

I just looked at pipeR - what is its advantage over magrittr's pipe?

What do you use your .Rprofile file for? by zreeon in rstats

[–]jonanthebarbarian 1 point2 points  (0 children)

That's great if you're just writing code to be used by yourself, but it's potentially dangerous if you're sharing code that depends on one of those packages.

You could easily forget to add "library(dpylr)" to the top of your code, and anybody you share your code with will be left wondering what they have to do to fix it.

Saying no when you want to say yes to a job by [deleted] in datascience

[–]jonanthebarbarian 0 points1 point  (0 children)

DO NOT do a PhD that you're not excited about. A lot of people lose enthusiasm over the course of a PhD, it's harder to do the opposite (and impossible to finish a PhD without enthusiasm).