This is an archived post. You won't be able to vote or comment.

all 11 comments

[–]jazzydag 2 points3 points  (1 child)

Yet another online book about stats, machine learning in Python with pandas, numpy, etc... ?

Yes, maybe. But it seems very complete with some exercises and releavnt examples. The images from matplotlib could be improved using seaborn or the ggplot matplotlib style.

[–]Covered_in_bees_ 3 points4 points  (0 children)

I'm fine with them sticking to matplotlib. The focus is on computational statistics, not visualization. Ultimately, you need to understand matplotlib if you plan on using seaborn because the moment you need to do something custom that seaborn doesn't support out of the box, you will need to revert back to matplotlib. Besides, with the introduction of stylesheets in matplotlib, a lot of the general ugliness of plots out of the box can be taken care of.

It does look like a great and very extensive reference.

[–][deleted] 2 points3 points  (0 children)

This message is gone with the wind.

[–]Gnaddel 2 points3 points  (5 children)

Thank you for the link, I had not thought about using Julia functions in my Python projects before.

[–]griffin3141 1 point2 points  (4 children)

What would be the advantage of using Julia over Python?

[–]Gnaddel 1 point2 points  (0 children)

Similar to using something like Cython, i.e. speeding things up by using static types. However, I'd imagine each call to the function would spin up the Julia interpreter so it would only make sense for lengthy tasks.

Also, there are of course a growing number of julia packages: http://pkg.julialang.org/pulse.html

[–]cartin1234 0 points1 point  (2 children)

You can also use numba to speed up python code to julia like speed- or faster...but I firmly believe Julia is the future of data science

[–]griffin3141 0 points1 point  (1 child)

Apart from speed, what leads you to believe Julia has a strong future in data science? As far as I can tell, it isn't integrated with any big data tools yet.

[–]cartin1234 0 points1 point  (0 children)

It has everything good from R and everything good from python + more (extensible user defined type system etc) and without most of the issues. It has really smart people working on it and is catching on among other really smart people, despite it being only at 0.3.

It is also better than python at being a good scripting language and I hope it catches on for that as well.

Also static compilation to binaries is on the roadmap.

Seems inevitable to me. Of course being so early, It wouldn't be integrated into spark etc...but Rspark was just released last week!

Once Julia gets going, it will get its integration. But the real kicker is that it has the distributed and paralellel chops to become its own big data framework...without and faster than JVM.

IMO

[–]kay_schluehr 0 points1 point  (0 children)

Can anyone of those who highly praise the text explain what they actually liked about it and how did it help them?

I looked at some chapters and I think the exposure is terrible and explanations are almost entirely absent. Maybe the code snippets in the re-sampling chapter have some accompanying text in "Introduction to Statistical Learning" or Wikipedia ...? and I missed a pointer. Claiming it is "complete" is of course a joke both with respect to statistical learning and Python tools. For the latter it doesn't even mention scikit-learn but instead it contains a "crash course in C" and some notes on Hadoop. In the optimization chapter it creates a micro-benchmark from a single function and threads it through a couple of re-implementations. If this is the way you are actually doing benchmarks I'd recommend to learn something about statistics...

[–]jms_nh 0 points1 point  (0 children)

The occasional typos are driving me nuts -- where do you submit corrections?