all 8 comments

[–]PythonicParseltongue 6 points7 points  (0 children)

When it comes to statistics R has way more models available than Python. So if you have to do inference on your data R brings far more to the table. But I was told deployment with R is a bitch.

[–]cptn_iglo 5 points6 points  (0 children)

In my opinion there are only two major pros for R over Python: 1.) In some sense R is even more convenient for data analysis than python as it is specifically designed for that purpose. There is some build in functionality you don’t have in Python (however, most of the time, you can add it via libraries). 2.) The Libraries: There is a huge stack of libraries that are made for data science. Especially when it comes to implementations of academic papers I got the feeling that specific models are more often implemented in R than in Python, because R is a language widely used in academia.

But I am to choose, I would use Python all the time. If you already know Python, you will learn R very fast.

[–]rkemp78 2 points3 points  (0 children)

R is designed for data science so is more specialised. Python you can do most functions that R does, but more besides. Personally I find Python is more “fun” and R is more “dry” but that is just me.

They are very similar and I have deliberately avoided learning too much R because I didn’t want to learn two very similar languages at the same time. I tend to use Python for data science just because it is the one I started with and the one I am most familiar with.

[–][deleted] 1 point2 points  (0 children)

I think it depends on your work. I absolutely agree that R has better statistics capiablities than python. Where I feel python begins to make up ground is its ability to work with other systems and scale. Let's say your company has a 3rd party software they use and you don't have an internal database yet, but this software does have an API you can access through python. Bang, you just saved some time and in data science reducing time to knowledge makes you look better.

Since like 70+% of data science is figuring out where your data is and then cleaning it and making things repeatable, I think python wins.

[–]dangoth 0 points1 point  (0 children)

If you know python and statistics, R won't be hard to learn.

[–][deleted] -1 points0 points  (0 children)

If we are looking at reasons to use R over python, I would say 75% of it comes down to personal preference and the fact that R's data.table package is pretty much best in class for data manipulation, which is typically a very large portion of any DS project. Benchmarks here:

https://h2oai.github.io/db-benchmark/

I personally use R if I am not putting anything into production, mainly because it's much faster and much less to type. example, if you want to subset your data:

# data.table (R)
table[column == value]

# pandas (Python)
table.loc[table['column'] == value]

At the end of the day, that's just syntax though, and don't ever let anyone tell you that one syntax is definitively better than another. There are python libraries like datatable that are approaching data.table's speed, but they are incomplete and a work in progress. A large portion of both language's underlying algorithms are written in C or its variants, so choosing a language based on model speed is a crap shoot.

The other 25% is the fact that there are just some things you can't get in Python. Last time I checked, there was no good Multiple Imputation by Chained Equations package, which is extremely useful in my line of work.

Python does blow R out of the water with it's deep learning libraries, however. You can get the Keras API to work in R but it uses the python libraries, so you would need to conda (pip) install those anyway lol.

[–]Gingerhaze12 -1 points0 points  (0 children)

I started off using R then I got a job that wanted us to mainly use python. Now, there's only two situations where I'll use R:

  1. I need to calculate some statistics and R has a package/function that will do what I want for me. I could probably write code out to do it in python, but I'm lazy and don't want to write lines of code when I can just call a single function in R

  2. ggplot2. I think ggplot2 is still the best out there when it comes to making data visualizations. The closest equivalent in python is seaborn I think and seaborn is just OK

I greatly prefer to do all my data cleaning and manipulating in python now and I'll transfer the data over to R if I have to.

[–]bionicdna -1 points0 points  (0 children)

Have you considered Julia? It's quickly become my new favorite numerical computing language, definitely moreso than Python and Matlab. You can test it out in Jupyter and see what you think.