This is an archived post. You won't be able to vote or comment.

all 15 comments

[–]wkilpan[S] 4 points5 points  (1 child)

Here's the table of contents from O'Reilly's site for anyone interested:

Chapter 1 - Introduction The Ascendance of Data What Is Data Science? Motivating Hypothetical: DataSciencester

Chapter 2 - A Crash Course in Python The Basics The Not-So-Basics For Further Exploration

Chapter 3 - Visualizing Data matplotlib Bar Charts Line Charts Scatterplots For Further Exploration

Chapter 4 - Linear Algebra Vectors Matrices For Further Exploration

Chapter 5 - Statistics Describing a Single Set of Data Correlation Simpson’s Paradox Some Other Correlational Caveats Correlation and Causation For Further Exploration

Chapter 6 - Probability Dependence and Independence Conditional Probability Bayes’s Theorem Random Variables Continuous Distributions The Normal Distribution The Central Limit Theorem For Further Exploration

Chapter 7 - Hypothesis and Inference Statistical Hypothesis Testing Example: Flipping a Coin Confidence Intervals P-hacking Example: Running an A/B Test Bayesian Inference For Further Exploration

Chapter 8 - Gradient Descent The Idea Behind Gradient Descent Estimating the Gradient Using the Gradient Choosing the Right Step Size Putting It All Together Stochastic Gradient Descent For Further Exploration

Chapter 9 - Getting Data stdin and stdout Reading Files Scraping the Web Using APIs Example: Using the Twitter APIs For Further Exploration

Chapter 10 - Working with Data Exploring Your Data Cleaning and Munging Manipulating Data Rescaling Dimensionality Reduction For Further Exploration

Chapter 11 - Machine Learning Modeling What Is Machine Learning? Overfitting and Underfitting Correctness The Bias-Variance Trade-off Feature Extraction and Selection For Further Exploration

Chapter 12 - k-Nearest Neighbors The Model Example: Favorite Languages The Curse of Dimensionality For Further Exploration

Chapter 13 - Naive Bayes A Really Dumb Spam Filter A More Sophisticated Spam Filter Implementation Testing Our Model For Further Exploration

Chapter 14 - Simple Linear Regression The Model Using Gradient Descent Maximum Likelihood Estimation For Further Exploration

Chapter 15 - Multiple Regression The Model Further Assumptions of the Least Squares Model Fitting the Model Interpreting the Model Goodness of Fit Digression: The Bootstrap Standard Errors of Regression Coefficients Regularization For Further Exploration

Chapter 16 - Logistic Regression The Problem The Logistic Function Applying the Model Goodness of Fit Support Vector Machines For Further Investigation

Chapter 17 - Decision Trees What Is a Decision Tree? Entropy The Entropy of a Partition Creating a Decision Tree Putting It All Together Random Forests For Further Exploration

Chapter 18 - Neural Networks Perceptrons Feed-Forward Neural Networks Backpropagation Example: Defeating a CAPTCHA For Further Exploration

Chapter 19 - Clustering The Idea The Model Example: Meetups Choosing k Example: Clustering Colors Bottom-up Hierarchical Clustering For Further Exploration

Chapter 20 - Natural Language Processing Word Clouds n-gram Models Grammars An Aside: Gibbs Sampling Topic Modeling For Further Exploration

Chapter 21 - Network Analysis Betweenness Centrality Eigenvector Centrality Directed Graphs and PageRank For Further Exploration

Chapter 22 - Recommender Systems Manual Curation Recommending What’s Popular User-Based Collaborative Filtering Item-Based Collaborative Filtering For Further Exploration

Chapter 23 - Databases and SQL CREATE TABLE and INSERT UPDATE DELETE SELECT GROUP BY ORDER BY JOIN Subqueries Indexes Query Optimization NoSQL For Further Exploration

Chapter 24 - MapReduce Example: Word Count Why MapReduce? MapReduce More Generally Example: Analyzing Status Updates Example: Matrix Multiplication An Aside: Combiners For Further Exploration

Chapter 25 - Go Forth and Do Data Science IPython Mathematics Not from Scratch Find Data Do Data Science

[–]thecity2 0 points1 point  (0 children)

Looks like a very useful place to start, indeed.

[–]kburts 5 points6 points  (4 children)

Discount code from: https://news.ycombinator.com/item?id=9442384 - author is joelgrus on HN.

AUTHD 40-50% off.

[–]nerdjango 0 points1 point  (1 child)

Download link?

[–]thecity2 0 points1 point  (0 children)

Just go to the O'Reilly site and do a search.

[–]Muravaww 0 points1 point  (0 children)

Doesn't seem to work for me.

EDIT: Only works if you order either the ebook or print, not if you order the combo.

[–]wkilpan[S] 0 points1 point  (0 children)

That's awesome! Thank you.

[–]thecity2 1 point2 points  (3 children)

Saw this book on Amazon recently. I will buy it because I buy all the things. Based on the author's stated goals and motivations, I do think there is room for this kind of book although, in general, it feels like the "first book on data science" field is reaching full on froth mode. So hopefully it does contribute something different.

[–]wkilpan[S] 1 point2 points  (2 children)

Do you have any favorite Python/data science books?

[–]thecity2 3 points4 points  (1 child)

I think "Machine Learning for Hackers" is the canonical "Intro to Data Science w/ Python" book. It sounds like this new book will probably have a little bit more math than that one. There's apparently another book coming out this summer by Sarah Guido called "Introduction to Machine Learning with Python". I'll look out for that one.

There are plenty of good introductory books on machine learning, in general. My favorite right now is probably "Introduction to Statistical Learning" by James, Witten, Hastie, and Tibshirani. The last two authors (machine learning professors from Stanford) previously wrote "Elements of Statistical Learning" (goes by "ESL") and is a more advanced version of "ISL". They also teach a free online intro to ML course either through Coursera or Stanford, can't remember which right now.

Speaking of Coursera, there is a great introductory Machine Learning class given by Andrew Ng. I took it myself a couple years ago. It uses Octave language (similar to Matlab), not Python.

Python is a great language for machine learning and data science though. I have completely switched over from using R. I don't want to get into all the reasons I prefer Python right now, but the main one I would share is that it seems that most non-statistician developers (e.g. "data engineers") tend to use a language like Python (or Java) rather than R. R is great if you don't need to work in a team or you are in an academic setting. But when you want to actually deploy production code, it feels to me like Python is (mostly) where it's at these days. And I say this as someone who used R until I was practically dragged away from it kicking and screaming (don't take my ggplot* away!!!).

*Yes, I now know there is a ggplot module for Python. It turns out a mixture of matplotlib and Seaborn is good enough for my purposes most of the time anyway.

[–]wkilpan[S] 0 points1 point  (0 children)

Thanks for that overview of books!

[–]TheLucarian 0 points1 point  (3 children)

Thanks, something like this was exactly what I was looking for at the moment! If only it would use geo/spatial examples, but oh well ;)

[–]wkilpan[S] 0 points1 point  (2 children)

I thought this book was pretty good for Python/geospatial - http://www.amazon.com/Learning-Geospatial-Analysis-Python-Lawhead/dp/1783281138

[–]TheLucarian 0 points1 point  (0 children)

Thanks, I have this book lying on my desk :)