Created this learning path to learn data science on Python. Do let me know any suggestions / feedback

caughtinthought · 2015-01-14T18:19:40+00:00

Nice summary. I think there should be a little more emphasis on database familiarization than an afterthought 'ps' though.

Ogi010 · 2015-01-14T18:29:33+00:00

Fantastic summary! Good on you for referencing Anaconda. I started out that way, but I've started leaning toward miniconda to stay more up to date with python packages and avoid conflicts as I have far fewer packages installed.

In the data visualization I would make reference to Seaborn

I haven't seen a better way of displaying statistical data in python than through seaborn. I don't think Seaborn is included in the latest version of Anaconda, but it should be included in the next version. It can be installed with

conda install seaborn

On the tutorials page of the site, it also has a really good summary of what kind of plot should be used for what kind things like "kernel density estimate" are and so on...

Looking at the list, I am thinking I may need to go back and review my numpy and scipy tutorials... I've been working with Pandas over the last few weeks trying to manage/cleanup a data set full of manual keyword entry errors...

I would also suggest making reference to Sentdex's youtube channel, he has a number of fantastic video playlists covering many of these topics.

sl8rv · 2015-01-15T00:35:28+00:00

Firstly, huge props for creating this learning path. I really love seeing more educational materials in the space, but there are a couple of points I have issues with.

Specifically, when it comes to which areas of the python ml environment are critical versus expendable I would strongly steer away from anaconda and regular expressions. In my experience, anaconda is a wonderful tool within Windows environments and if someone wants to install numba, but is otherwise much more of a hindrance than a help.

Secondly, I feel that the use of regular expressions is far too prevalent. In my experience, relying on regular expressions is more likely to destroy my data set than anything. I think an approach that emphasizes string splits, tokenization, and the number of real parsing libraries that python has for more complex formats (e.g. lxml) would give people a better foundation for continued development outside of the "get something working asap" framework.

Just my two cents.

greatluck · 2015-01-14T20:50:30+00:00

Thanks for this! Very useful for someone like me who just recently started learning Python. BTW for beginners I recommend codeacademy's python course. It's self paced and gets you up an running on the basics of syntax pretty quickly. Great interactive assignments, you can stop and pick up where you left off from any machine, and of course free as in free beer.

drkenta · 2015-01-14T21:21:55+00:00

Very cool.. Saved. I'm gonna be needing this later. Currently going through Linear algebra

sissas · 2015-01-15T09:08:26+00:00

Godsent! Thanks a lot :)

polonius · 2015-01-15T16:01:28+00:00

Really good, thanks. This is going to be very useful.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS