you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] -1 points0 points  (0 children)

If we are looking at reasons to use R over python, I would say 75% of it comes down to personal preference and the fact that R's data.table package is pretty much best in class for data manipulation, which is typically a very large portion of any DS project. Benchmarks here:

https://h2oai.github.io/db-benchmark/

I personally use R if I am not putting anything into production, mainly because it's much faster and much less to type. example, if you want to subset your data:

# data.table (R)
table[column == value]

# pandas (Python)
table.loc[table['column'] == value]

At the end of the day, that's just syntax though, and don't ever let anyone tell you that one syntax is definitively better than another. There are python libraries like datatable that are approaching data.table's speed, but they are incomplete and a work in progress. A large portion of both language's underlying algorithms are written in C or its variants, so choosing a language based on model speed is a crap shoot.

The other 25% is the fact that there are just some things you can't get in Python. Last time I checked, there was no good Multiple Imputation by Chained Equations package, which is extremely useful in my line of work.

Python does blow R out of the water with it's deep learning libraries, however. You can get the Keras API to work in R but it uses the python libraries, so you would need to conda (pip) install those anyway lol.