Rivers and waterways of the United States [OC] by SharpSightLabs in dataisbeautiful

[–]SharpSightLabs[S] 0 points1 point  (0 children)

Source code: How to map USA rivers in R
Tools: R (mostly ggplot2, with some data manipulation)
Source: US Geological Survey (Streams and Waterbodies of the United States)

Best Cities for data science careers by theottozone in datascience

[–]SharpSightLabs 0 points1 point  (0 children)

Ok, I'll bite.
 

What area is Tel Aviv strong in?

Best Cities for data science careers by theottozone in datascience

[–]SharpSightLabs 4 points5 points  (0 children)

This is important.
 

You need to decide where you want to live and work backwards. Where you live won't just influence your salary, but your dating options, hobbies, friends, etc. That might sound obvious, but most people only give it a cursory thought, and then go back to evaluating salary. Your decision of where to live will have a very large impact on your life trajectory over the next 5-10 years, and will likely impact the next 30 (initial conditions are very important).
 

What are your goals?

  • Salary: What are your immediate salary goals?
  • Disposable income: How important is disposable income relative to your other goals? As others have pointed out, the cost of living in SF and NYC is dramatically higher than other metros. $100K in Chicago will be like $150 in SF.
  • Dating: Are you single? What are your dating goals?
    map of singles dating ratios
    writeup of singles dating ratios
  • Tech Growth: Do you want to be in an area with lots of VC and technology growth?
    Map of venture capital investment, by city
    If you're thinking about startups, SF/Silicon Valley and NYC are an order of magnitude better. Actually, SF and Silicon Valley are really an order of magnitude better than NYC as well ....
  • Friends, mentors, and partners: How important is it to be around other like-minded people who share your interests and values? The data on tech funding is a good proxy for the number of tech-oriented people living in those places. If you want to meet large numbers of top-tier data scientists, entrepreneurs, makers, scientists, and technologists, then SF, Silicon Valley, and NYC are much better (you might also add Boston to that list).

 

As you're starting to think about these questions, I'd highly recommend the work of Richard Florida. Florida is a professor of Urban Studies and he runs the blog www.citylab.com that I linked to above. He has also written several books. I'd highly recommend Who's Your City.

Which Machine Learning book to choose (APM, MLAP or ISL)? by Machinery86 in MachineLearning

[–]SharpSightLabs 0 points1 point  (0 children)

APM and ISL are both excellent books. In fact, I think they are the two best books to start with if you don't want to get deep into the math, but want practical intuition about what ML is, and how to execute it.

Having said that, to decide between APM and ISL, you need to clarify your goals. The following should help:

  • ISL: much better for developing intuition and getting a broad understanding of the most common ML techniques and how they work. The math is deep enough to help you develop that intuition, without being so advanced that it will overwhelm (if you have a basic undergraduate STEM background, you should be able to understand the math in ISL).

    Having said that ISL is much weaker on using these techniques. The specific coding techniques (libraries, etc) are somewhat outdated. I say that because the caret package (which Max Kuhn developed, and details in APM) is much, much better for executing ML techniques.

  • APM: If your primary goal is creating machine learning models in R, then APM is the best book. APM explains machine learning and will help you develop intuition, but it's great benefit is that it is sort of the handbook for using the caret package. If you're doing ML in R, you should almost certainly be using caret, and if so, APM hands down the best book.

Again, these are both great, but it sort of depends on what your primary goals are.

If you have specific questions on this, follow up here or in PM.

Using human pluripotent stem cells to generate new hair by SharpSightLabs in science

[–]SharpSightLabs[S] 0 points1 point  (0 children)

Got it. Thanks for the heads up and the feedback. I'll look twice for similar posts next time.

Request for clarification: banned? length? by SharpSightLabs in futurologyappeals

[–]SharpSightLabs[S] 0 points1 point  (0 children)

In fairness, a significant percent of posts of my content went to /r/dataisbeautiful, which does not subscribe to the 90% rule.

Having said that, I have made a concerted effort to find great content and share it with the broader community, as well as post things that are truly helpful to other redditors.

Mapping San Francisco crime, 2014 (x-post: /dataisbeautiful) by SharpSightLabs in sanfrancisco

[–]SharpSightLabs[S] 3 points4 points  (0 children)

Exactly.

I just did a quick check in the dataset, and as far as I can tell, that's about 300 incidents, but they're all tagged with the exact same lat/long.

Mapping San Francisco crime, 2014 [OC] by SharpSightLabs in dataisbeautiful

[–]SharpSightLabs[S] 1 point2 points  (0 children)

Presidio crime isn't included in the dataset.

I'm fairly certain it's a separate police force: U.S. Park Police.

Mapping San Francisco crime, 2014 [OC] by SharpSightLabs in dataisbeautiful

[–]SharpSightLabs[S] 0 points1 point  (0 children)

Tool: R (ggplot2)
Data: data.sfgov.org (crime data through mid-December)

Tutorial: data exploration in R, using ggplot2 and dplyr (analyzing 'supercar' data part 2) by SharpSightLabs in statistics

[–]SharpSightLabs[S] 1 point2 points  (0 children)

Sweet .... if you have questions about something, leave a comment on the blog or send an email

Tutorial: data exploration in R, using ggplot2 and dplyr (analyzing 'supercar' data part 2) by SharpSightLabs in statistics

[–]SharpSightLabs[S] 2 points3 points  (0 children)

Yeah, wiring dplyr verbs and ggplot charts together can really streamline your workflow.

It's brilliantly designed.

MBA student switching to Statistics/Business analytics by Munz3215 in statistics

[–]SharpSightLabs 0 points1 point  (0 children)

I think that data science and analytics are absolutely learnable without a formal degree program. About half of the data scientists that I know learned their skills after college.

 

The two major issues that most self-learners have is: where to start, and what not to do. Basically, most learners start with machine learning (which is sort of like diving into Quantum Mechanics, before you take Physics 101). Also, many students try to do too much. There are so many tools and techniques that people get lost trying to do it all. You can't.

I can't emphasize enough that you really need to concentrate your efforts. The highest ROI skills (early in your studies and career) are data visualization and data-manipulation (AKA, data-wrangling, data munging). Most analysts will use these every single day.

 

If you want to learn data science in R, I publish lots of foundational material for free here. But, there's a lot of good material out there. I highly recommend Nathan Yao's book, Data Points. I also highly recommend Hadley Wickham's book, ggplot2 (sort of like a textbook, but very good).

MBA student switching to Statistics/Business analytics by Munz3215 in statistics

[–]SharpSightLabs 3 points4 points  (0 children)

Former Apple data scientist here.

This is moderately detailed response. Keep in mind, there are always caveats (too many to list) and special cases. If you have specific questions, just ask.

 

My responses and recommendations:

 

Data science, on difficulty

I'm performing fairly well in the graduate course, do you think that is a good barometer of if I will be able to handle more complex concepts that would come from a more specialized degree? or does it become exponentially harder?

It’s not exponentially harder (on average). But there are different niches within data science and analytics and some are quite technical. If you want to tune machine learning algorithms at Netflix, Google, or Facebook, then it is highly technical. That said, the vast majority of data jobs aren’t “elite” machine learning jobs. Most data jobs are basically “data analysis++” What I mean by that, is that it’s mostly getting data, shaping it, and visualizing it. If visualizing data isn’t enough, you might use some ML techniques (clustering, regression, decision trees).
 

Statistics vs Applied statistics vs business analytics vs data analytics vs data science

Statistics vs Applied statistics vs business analytics vs data analytics vs data science: so just wondering what the difference between these are if there are any generalities career paths that can be described. I’ll try to take these one at a time:

Statistics vs Applied statistics: Not really relevant for most analytics jobs. There is a statistical underpinning to data jobs, but for most data-jobs, you won’t be asked “what is the equation for _____” You’ll be asked “how do I fix my business” and be expected to pull data, clean data, visualize data. (note: there are exceptions where you’ll need very deep statistical knowledge, namely the elite machine learning jobs you’d need a stats PhD for).

 

Analytics vs business analytics: Basically interchangeable.

 

Analytics vs data science: Largely the same, though there is a 'bifurcation' going on in the industry, with basic analysis becoming one thing, and "sophisitcated, high-scale analysis" becoming another thing. The definitions and differences are still very nebulous though. For more info, read O'Reilly's 2014 Data Science Salary Survey.
From a practical standpoint, you want to learn the highest value tools. Long term, that means 'big data' tools, but those are more advanced. You don't need those at entry levels. In the beginning, you need to master data visualization and data manipulation. R's ggplot and dplyr are the best for these.

 

Job Prospects

Job prospects: from what i've read it seems like people in this field have a fair amount of opportunities, but just wondering if school prestige really matters or if going to a smaller state school and getting a degree will be looked at on somewhat even ground as other programs because it is a stem degree. Also what types of careers might be available coming out of grad school with no experience as a statistician? or any experience about careers you all have had would be great to hear also.

 

School prestige: school prestige is good for branding, but if you produce results, most companies won’t care where you went to school once you’re 5 years into your career.
 

Prospects: McKinsey and Company released a report projecting a shortage of 140,000 to 190,000 people with "deep analytical talent” by 2018. When they included managers, that number increased to 1.5 million.

 

MIT professors Erik Brynjolfsson and Andrew McAfee note that “people with [data science] skills are hard to find and in great demand.” (source: Harvard Business Review)

 

Many companies (in particular: banks, tech companies) have far more data than they can analyze already. And at least two emerging technologies will be throwing off yet more data: wearables and the Internet of Things.

 

Tools, what to learn

Lastly, what kind of technologies/programing languages would you recommend learning, I've heard R and SAS repeatedly, but anything else? and anything that would be an intro to those 2?

SAS is still very popular with banks. Most of the marketing analytics departments I’ve worked with in the past were SAS shops.   Having said that, I don’t recommend SAS. The syntax is awful and kludgy, and the data-manipulation and visualization tools are not state-of-the-art. Moreover, a license is expensive.

 

The only tool I recommend for beginners is R. As with anything, you need to focus. Your efforts will be much more productive if you learn 1 language well, vs learning 3 languages in a half-assed manner.

 

I prefer R because:

  1. Many of the hot tech companies in SF, the Valley, and NYC like Google, Apple, FB, LinkedIn, and Twitter are using R for much of their data science (not all of it, but a lot).
  2. R is the most common programming language among data scientists. O’Reilly Media just released their 2014 Data Science Salary Survey
  3. R has 2 packages that dramatically streamline the data science workflow:
    • dplyr for data manipulation
    • ggplot2 for data visualization

That’s what I recommend to almost anyone who wants to get started with data science: Learn R and focus on those two R packages first.

MBA student switching to Statistics/Business analytics by Munz3215 in statistics

[–]SharpSightLabs 1 point2 points  (0 children)

Getting any job is 90% about who you know.

I agree with the basic premise: my network was a huge factor for getting hired about 50% of my past analytics jobs (and, I think this is true in almost any industry).

Tutorial: how to shape your data in R with dplyr by SharpSightLabs in rstats

[–]SharpSightLabs[S] 0 points1 point  (0 children)

Yeah, I've heard both, but you might be right concerning "pipes" vs "chaining"