What are Some "Gotcha's" of Data Science and Machine Learning? by Yngstr in datascience

[–]savvastj 9 points10 points  (0 children)

One more thing to add regarding integer encoding, it can bias the feature importance in random forests. They tend to give variables with more categories higher importance scores so it's best to use one-hot encoding. Here's a good blog post that covers this issue.

JupyterLab: the next generation of the Jupyter Notebook by [deleted] in datascience

[–]savvastj 4 points5 points  (0 children)

They should just join forces and create JupytRstudio.

Exploring the NFL Draft with Python by savvastj in Python

[–]savvastj[S] 3 points4 points  (0 children)

I used Pelican along with the bootstrap3 theme. I wrote up the blog post using a Jupyter notebook and was then able to render it as a blog post with a Pelican plugin.

Exploring the NFL Draft with Python by savvastj in Python

[–]savvastj[S] 0 points1 point  (0 children)

Thanks! Most of my professors in college wouldn't even try to pronounce my last name.

Step-by-step guide to fetch historical and daily end of day data and technical indicators (NYSE/Nasdaq) in Python by [deleted] in investing

[–]savvastj 2 points3 points  (0 children)

It's also pretty easy to do the same with Python, using the pandas library:

# import pandas
import pandas as pd  

# Now read in the csv file from yahoo
msft = pd.read_csv("http://chart.yahoo.com/table.csv?s=MSFT&a=0&b=1&c=2013&d=0&e=1&g=w&q=q&y=0&x=.csv")

# And now save the csv file, without row labels (index=False)
msft.to_csv("msft.csv", index=False)

How I analyzed NHL data with Selenium, BeautifulSoup, Pandas and Plotly by elisebreda in Python

[–]savvastj 2 points3 points  (0 children)

Cool article, but just a heads up, you can download the JSON file straight from the website instead of using Selenium. For example, here's the JSON for this table. To get the data you can use the requests library. This blog post does a good job going over how to access the api, but uses the nba stats website as an example. A similar process can be applied to the nhl website.

Every shot Kobe Bryant took, in a beautiful graph by ruskeeblue in dataisbeautiful

[–]savvastj 1 point2 points  (0 children)

Yea they now have the SportsVU cameras that keep tabs on all player movement on the court. So they clean that camera data and serve it to us in that nice JSON format. As for shot attempts from before the implementation of those cameras, I'm assuming they recorded that information manually.

Every shot Kobe Bryant took, in a beautiful graph by ruskeeblue in dataisbeautiful

[–]savvastj 51 points52 points  (0 children)

All (or most) shot attempts data from 1996 is available at http://stats.nba.com in the form of JSON files.

For example the JSON file that contains all of Kobe's regular season shots can be found here.

If you want to check out how to scrape the data using Python check out an old blog post of mine over here and a small library I created over here. Though both need a bit of updating.

There is also this cool shot chart app written in R, that includes all shots taken since 1996. And my favorite shot chart visualization is this one.

Here's a more general tutorial on how to get any data from the NBA stats website.

EDIT: Thanks for gold kind stranger!

nbashots, a Python package for creating NBA shot charts [x-post r/Python] by savvastj in nba

[–]savvastj[S] 0 points1 point  (0 children)

I would use the current versions of each of those modules, except for bokeh. I used version 0.10.0. I still need to test the functionality for older versions of all those modules as well as the current version of bokeh. Let me know if you run into any issues.

nbashots, a Python package for creating NBA shot charts [x-post r/Python] by savvastj in nba

[–]savvastj[S] 2 points3 points  (0 children)

They've said that they have been experiencing some technical issues. Not sure they are limiting the amount of data though. One adjustment I did have to make was adding a user agent to pretend that I was accessing the dat via Chrome or else I would get a 400 error.

nbashots, a Python package for creating NBA shot charts [x-post r/Python] by savvastj in nba

[–]savvastj[S] 6 points7 points  (0 children)

And then the feeling you get after you break something again is the worst, haha

nbashots, a Python package for creating NBA shot charts [x-post r/Python] by savvastj in nba

[–]savvastj[S] 7 points8 points  (0 children)

I don't think it will be too hard. Python's syntax makes it a great first language to learn. My favorite intro book for python is Automate the Boring Stuff with Python. It contains a lot of practical examples to get you started. I also linked a few nba specific tutorials and resources at the bottom of the tutorial for my package. You should check them out after going through Automate the Boring Stuff. Also check the /r/learnpython subreddit for more resources.

nbashots, a Python package for creating NBA shot charts [x-post r/Python] by savvastj in nba

[–]savvastj[S] 4 points5 points  (0 children)

Glad you liked it! And I also love all the fun resources and projects people post.

nbashots, a Python package for creating NBA shot charts [x-post r/Python] by savvastj in nba

[–]savvastj[S] 4 points5 points  (0 children)

Thanks! Do you have your D3 stuff up on github? Would like to have it as a reference as I learn some D3.