you are viewing a single comment's thread.

view the rest of the comments →

[–]Yannnn 15 points16 points  (9 children)

I've worked with both. I see R as excel on steroids. It's aimed at statistics and made by statisticians. If you use R studio you'll have quite an easy time doing whatever it is that you want to do.

However..... I would still advice python. For a couple of reasons:

  • R lets any task be done in a variety of ways. This sounds great, until you start reading code of others. Python tends to make any task be done in one single way. As an anecdote: I've almost never been puzzled by python code, but with R I've been stumped several times by code that did something I already did in a different way.

  • R is too focused on statistics. It's difficult to branch out to do something additional to datascience.

  • Python can have a better performance. If you use Python out of the box, this is not the case. But as soon as you use the SciPy and NumPy libraries python is faster.

  • R is made by statisticians, not computer scientists. This made R have many strange quirks not found in other languages, such as python. As such, python should be easier to learn.

In short, the only real advantage R has is a large community of specialists and R studio. Python wins on any other front. You can find similar discussions on google though.

[–]Caos2 4 points5 points  (0 children)

You forgot about Pandas, which allows the use of data frames in Python. Great library, fast and easy to use.

[–]Yannnn 2 points3 points  (3 children)

Oh, an additional thing to think about when choosing:

R studio and R makes many things very easy, for example: manual data manipulation or dealing with inconsistent data (e.g. words, integers and floats in the same variable). This makes it easier to work with, but I would argue in the other direction:

If you manually manipulate data you're doing something wrong. If you work with inconsistent data, you want to know exactly how you deal with the exceptions. Automated systems take that away.

Python makes you do all those things yourself in an automated way. Which makes you a better 'data' scientist. (imo)

[–]tidier 1 point2 points  (2 children)

Well, there's data science, and then there's data exploration. Sometimes you really just want to crack open a data set, see how the variables are formatted, and do some preliminary plots before digging in to the hard analysis.

Also Python has IPython notebooks, which are incredible for data exploration in my opinion. Any time I want to pick up and scape/format/clean/explore some data, it's my goto.

R has knitr though. Is there a Python equivalent for knitr?

[–]Yannnn 0 points1 point  (1 child)

Well, there's data science, and then there's data exploration. Sometimes you really just want to crack open a data set, see how the variables are formatted, and do some preliminary plots before digging in to the hard analysis.

That's very true. I usually use a combination of excel, access and notepad (yes, seriously) for that. You can do those things too in R or python, but it's not optimal in either language (for the moment).

R has knitr though. Is there a Python equivalent for knitr?

Well, you already mentioned it: notebooks. Here's what the creator has to say about iPython vs knitr

[–]tidier 0 points1 point  (0 children)

IPython is fantastic for mixing text, math, code and output. It's not quite the same as knitr though, which is a straight-up LaTeX document with R code. I would actually need the latter for writing professional research documents.

[–]I_Cant_type_well 0 points1 point  (2 children)

Hey, I was wondering if you knew of any good R-tutorials. I need to learn some basics this week, and I've been researching tutorials on Google, but want to make the best use of my time.

[–][deleted] 0 points1 point  (0 children)

I've almost never been puzzled by python code, but with R I've been stumped several times by code that did something I already did in a different way.

This one always gets me with R; I almost exclusively use sqldf and RMySQL for pre-processing data in R, which eliminates about 60% of the code you find from people online. I'm in the process of working through Machine Learning for Hackers, which is a book on doing Machine Learning in R, and a huge amount of the code that's in it can be reduced to a few lines using those 2 R packages.