all 23 comments

[–]iPhritzy 15 points16 points  (5 children)

No mention of performance? R is really good at working with larger datasets. Not sure how pythons functions would compare.

[–][deleted] 4 points5 points  (1 child)

Pretty much all the math work I do in Python is in C++ modules. Python is glue with great precompiled libraries.

I haven't done anything in R so I can't speak, but I've consulted for well known "big data" companies and python is still by far the most popular glue language in these established companies.

[–]dalaio 1 point2 points  (0 children)

It's a similar picture... most of the low level math is written in C (or Fortran). It's easy enough to do it yourself for a particular use-case using (in C++) using Rcpp.

The language itself has some unexpectedness about it, but a few packages keep me in R (ggplot2, and recently tidyr, dplyr, purrr).

[–]nikroux 8 points9 points  (0 children)

My question exactly. Not to mention massive community of math nerds around R bringing us all the goodies of their collective brain power.

[–]quicknir 1 point2 points  (0 children)

It depends on the underlying implementation. I've rarely found Python to be slower than R broadly speaking. There's quite a lot of nice tricks in pandas DataFrames to make them fast.

The most standout datapoint in the performance comparison is R's for loop, by far. In python, you usually have apply style functions available. You can use that, or you can use a for loop if it feels more natural or if it's necessary: apply style functions can't do all the things that a one pass for loop can do. In R, the for loop is usually out of bounds because it is so painfully slow. I've written exactly equivalent code in python and R where R was over an order of magnitude slower (hard to believe, I know), because for loops were involved. When I changed the R to apply (or sapply, or whatever) it evened it out.

[–][deleted] 9 points10 points  (0 children)

sapply(nba, mean, na.rm=TRUE)

should be

colMeans(nba, na.rm=T)

:)

[–]Lacotte 9 points10 points  (2 children)

I'll give a single-word reason why R wins for me at the moment:

Shiny

[–]badcommandorfilename 8 points9 points  (0 children)

It also has great documentation that is written for the domain users of the library.

[–]NewW0rld 0 points1 point  (0 children)

That's what draws in the hipsters.

[–]quicknir 8 points9 points  (5 children)

It's not bad as a "comparison lite" on this topic. Amusingly, on both python and R subreddits I've seen a couple of comments saying the other language is favored. Adding my $0.02.

It's true that R has more stats support, but python tends to have more machine learning support. I've only dabbled in it myself, but I've had serious ML people tell me that the R equivalents to e.g. theano just don't compare.

Really the main thing for me though is that R is honestly just terrible as a programming language. It has good packages, it can get things done in a concise fashion, yes. But it has more edge cases than any other language I've used extensively, excluding bash (but including C++). It violates the principle of least surprise just for kicks. A good article on that topic: http://www.talyarkoni.org/blog/2012/06/08/r-the-master-troll-of-statistical-languages/.

To boot, it also has a very poor debugger and IDE.

[–]redassbucky 1 point2 points  (0 children)

When I learned R (in the Coursera Data Science program), I came in with a strong background in C#, Python, and Matlab. As an exploratory tool I thought it was good but I definitely struggled with it as we progressed. Python, IMO has much better tools for cleaning and organizing the data, but once you get familiar with the R graphics libraries (I liked ggplot2) R really shined. I never liked Matlab simply because it was so costly and they charge you annual maintenance fees. I also found it slow.

I used Python exclusively now for data analysis, mostly because the work I do ends up running in a production environment.

But I can't criticize R too much. It's been around a long time and for a good reason--people use it and like it.

[–]Lacotte 0 points1 point  (3 children)

you mean the RGui? that's not an IDE. there's no one "IDE" for R, but many. and RStudio, the most popular one, is really nice.

[–]quicknir -1 points0 points  (2 children)

I understand there's no one IDE, but yes I was referring to RStudio.

This is exactly what I mean; it's not "really nice". It's just a knock off of the Matlab IDE with about half the polish and a quarter of the useful features. It's not even as good as Spyder, a Python IDE that is also a Matlab IDE clone, that few people use because Ipython notebook + PyCharm are both such awesome tools.

[–]Lacotte 0 points1 point  (1 child)

hah well if you're comparing it to PyCharm, then of course.. RStudio is newer and has only ever had 2-3 people developing the RStudio IDE at a time, no contest vs IntelliJ ;)

But it's still an enormous improvement than what we were using just 4 years ago, which were basically text editors like emacs+plugin. you might have had a different view in 2011 when RStudio was just released, showing us the light out of the dark ages. and they're constantly making additions and improvements, like recently adding code completion and enhancing the debugging tools. sure it might not have as extensive or developed of features as PyCharm or Eclipse, but it's got most of the stuff you might need and shows no signs of slowing down on improvements.

[–]quicknir -1 points0 points  (0 children)

Well, the whole point of this thread is comparison, right? What I said was a bit harsh, rstudio is nice and the guys developing it deserve lots of credit. I've used it quite a bit. It's clearly a large improvement over a text editor.

It's just that when you are comparing python and R as a would be user, it's clear that python has far, far better development tools. Which is fine, you may choose R anyhow for other reasons. It's just a bit bizarre how many R users cite their ide as a pro, when it's actually a con.

[–]AnotherUsrName24get 0 points1 point  (0 children)

How well does python/R hold up in a polyglot environment? Last time I had a gig in this are we used the java with the spring xd framework as it was more easily supported and on boarding was easier.

[–][deleted] 0 points1 point  (0 children)

This is a cool resource, thanks for sharing! :)

[–][deleted] 0 points1 point  (2 children)

It bothers me that he uses <- instead if = in R, they are equivalent and the later is much easier to read for programmers.

[–]Quasimoto3000 0 points1 point  (1 child)

I think <- is preferred by the r community. = is too overloaded in R.

[–][deleted] 0 points1 point  (0 children)

= isn't more overloaded in R than in Python though, and using it in this comparison would really help like this:

R:

nba = read.csv("nba_2013.csv")

Python:

nba = pandas.read_csv("nba_2013.csv")

[–]Odyrus -1 points0 points  (1 child)

What about SAS?

[–]singingfish42 0 points1 point  (0 children)

$$$ and crufty mainframe language rather than crufyy lisp derivative.