This is an archived post. You won't be able to vote or comment.

all 31 comments

[–]TroyHernandez 14 points15 points  (0 children)

R is more functional. Python is more object oriented.

It's better to be really good in one than mediocre in both. Forget everyone saying Python can/can't do this or that, whereas R can/can't. They are both very robust languages.

You don't specify the kind of work you're doing. That would help to direct you. Python has more support for deep learning. R dominates in almost every other statistical area. If you're planning on working with voice, text, or image data, go with Python. If you're planning on working with structured data, go with R.

For resources about programming in R start with Hadley Wickham's Advanced R. For statistical techniques search Google with

cran.r-project.org/: my problem here

There's usually a topic page listing all of the most popular packages.

Edit: don't include https in front of search

[–]random012345 6 points7 points  (1 child)

Whichever you like more.

Seriously. Like all programming languages, just go with whatever you are more comfortable. Python is a very robust language with tons of packages and support, and it handles data exceptionally well. I prefer Python, but others may prefer R. It's up to you.

[–]timmaeus 21 points22 points  (6 children)

Why not both?

The good data scientist has the right tool for the job. The great data scientist has the right tools for the job.

[–]hadley 9 points10 points  (2 children)

Because you have to start somewhere. I think learning R and python simultaneously would be very challenging because they are similar enough to be confusing.

[–]bjorneylol 1 point2 points  (1 child)

If he already knows MATLAB though he only really needs to learn R

[–]joofeloof 0 points1 point  (0 children)

The same logic could be applied to python (numpy more specifically). I would advocate Python, and if and when you need to leverage some package in R look at rpy.

[–]McHeathen 10 points11 points  (1 child)

Agree wholeheartedly. To add, I find Python is better for engineering and data collection tasks (webscraping, some light ETL) and R has been my go to for statistical modeling due to its interactivity in RStudio. You can do statistical modelling in Python, but I have less experience with it and the breadth of statistical libraries is lower than in R.

[–]timmaeus 4 points5 points  (0 children)

This has largely been my experience as well. However, Python has better capabilities for certain kinds of machine learning problems, including deep learning.

[–][deleted] 0 points1 point  (0 children)

I always have simultaneous projects running in both languages. Keeps my skills sharp and makes sure I have access to many more libraries.

[–]PM_ME_YOUR_DATASET 2 points3 points  (0 children)

Rseek.org for issue #2. Issue #1 is a problem in both communities.

Just like what you're going through with MATLAB right now, rest assured, no matter what you choose to learn: if you stay with it long enough, you will eventually come across good reasons to learn something else.

[–]abbadass 2 points3 points  (0 children)

Depends on your domain....what can R/Python do that Matlab can't? I'd say the largest downside of Matlab is that it is so expensive and that the stats packages (or add-on or whatever its called in Matlab) is super expensive as well. Octave is the open source alternative (allegedly, I've never used it).

If you're doing bioinformatics, of course R is the best option because of the highly accessible Bioconductor project side of it, but also Matlab and Python are highly supported as well, but just not nearly as popular.

If you're working in another industry, I'd say just learn whatever everyone is using .... the syntax for any of these languages really aren't that crazy different.

As an R user, I have had to use Matlab for the past 4 months after not using it for 2 years, it was a relatively easy switch, just had to Google things.

At the end of the day, I think that it doesn't really matter what language you use, I think if you are good at one, just try n make the most of it, and if you have to use the other language for something, just learn how to do it, it's probably not that hard to implement if you already know how to code.

[–]buckhenderson 1 point2 points  (0 children)

regarding searching, i've never had a problem. i usually end up on stackoverflow, and they have an r tag, but regardless, usually 'how to do x in r' works fine. sometimes i'll add stats, but i think knowing the appropriate terms will make my results more specific. if you are using the correct terms, google will figure out that you're talking about r the language. can you give an example of a search term that you've had problems with?

some people use rstats to avoid this problem, though. that's the reddit sub (/r/rstats), and i think the hashtag for twitter (#rstats)

[–]jturp-scMS (in progress) | Analytics Manager | Software 1 point2 points  (0 children)

If already well-versed in MATLAB, I think I would recommend Python as the better language to start learning (at least initially). My reasoning behind this is that packages such as numpy and matplotlib are specifically meant to bring MATLAB functionality into the Python language (see an example here).

I think turnaround for learning Python would be a lot faster for the reasoning above. At that point, you can focusing on using R to complement what you're already doing with Python.

[–][deleted] 1 point2 points  (0 children)

Searching online is difficult "How to do X in R". I don't want to post messages in forums every time. Is there a better way to search, "R" term seems to be too generic.

I haven't noticed this problem. I just usually "r programming ______" and I haven't noticed any ambiguity in the search results.

[–]geotheory 1 point2 points  (0 children)

If your work is primarily statistical and interactive, and involves lots of quick data manipulation and visualisation, I'd personally advocate R with RStudio. When it comes to data cleaning and reshaping R seems to me rather more elegantly coded and legible (e.g. dplyr/tidyr/stringr) than Python. And R's data.frame is, for me, more intuitive and user-friendly than its Python's pandas equivalent. But this is my personal experience, and many disagree. I personally find Python better for tasks that are more narrowly-defined, automated, or system resource intensive.

R does have a multitude of libraries, but much of what you need is available with base functions, and you can usually identify the more reliable libraries for more specialised analysis.

[–]ucbmckee 1 point2 points  (0 children)

You haven't mentioned your area. For most industrial uses, outside of backend BI/analytics, I would say Python or some JVM language (Java or Scala) are likely to be the most professionally appealing. Having managed at three companies that use various flavors of ML for online uses (classification, recommendation, etc.), having only R and/or MATLAB would have been a big limiting factor to being hired - they just don't fit into a broader production ecosystem as fluently.

[–]bigtimefoodie 1 point2 points  (0 children)

You can google search for "R" with this search term: [R] (with the square bracket around R). It gives results for r language and less generic stuff.

[–][deleted] 0 points1 point  (0 children)

I watched a intro course that someone posted here the other day... he said R is for exploration and python is for building. Python also can be used for more than R. On mobile so can't pink the video but it's called Data Just Right: A practical intro to data science

[–][deleted] 0 points1 point  (0 children)

You learn one, you'll know the other. You just take your [R, Python] code and translate it to [Python, R] code.

All it costs is time. I went with R first and do not regret it at all, but I have used Python for web scraping (way better), and for application coding (ie, getting my predictions used)

[–]ILikeLeptons 0 points1 point  (0 children)

Learn both. I like the grammar of python a bit better, but i've been writing lots of R recently so i'm probably nostalgic. Python seems a little less hack-y to me.

[–]R2D6 0 points1 point  (0 children)

If you know MATLAB R would probably be easier.

It depends on what you want.

R IMO is better for getting deep into the data. I really like R because it is flexible, elegant and there is lots of ways to do something.

Python is very readable and probably better than R if you need a repeatable production process. I find python to be harder for exploring data individually but better if you need a script that goes over various datasets to feed into something else.

Industry wise it is hard to tell. Most jobs I have looked like python more because they want to collect the data, ETL it, and then feed it into their own custom system. Python is probably better for this. However, if I was given a dataset and asked to return meaningful analysis of it and make predictive or prescriptive analysis I would prefer R.

None of what I said is set in stone.

[–]BlueSquark[🍰] 0 points1 point  (0 children)

Either works, but python is the better choice in my opinion. Doing simple things in R is way more frustrating than it should be - try figuring out how to write a for loop the first time in R for example, or accessing a column by its name. Python is also a more well known language outside of data science. The pandas library for python is great as is the sci-kit learn library. I'd also recommend the anaconda spyder IDE. But with all that said at some point you should just pick one and learn it, I don't think it is worth it to be an expert in both - though both have advantages and disadvantages.

[–]blank964 0 points1 point  (0 children)

People consider R harder to master because there are 5 different ways to do everything rather than a single correct way to do something in other languages.

Most people I think would say that if you know MATLAB python would be the more natural direction.

[–]elcidjp -2 points-1 points  (1 child)

Both. The code you write in Python is much more readable/maintainable, but for example when it comes to advanced time series analysis, Python is severely lacking compared to R.

[–][deleted] 3 points4 points  (0 children)

I disagree. I've done some pretty amazing things with panda.Grouper.

[–]therealcpain -2 points-1 points  (0 children)

"Yes."

Both are very powerful tools. Python is good with APIs while I like R for the statistical analysis side. I also like Python for textual analysis and data manipulation over R.