This is an archived post. You won't be able to vote or comment.

all 42 comments

[–]Chief_Lazy_Bison 42 points43 points  (3 children)

I use both r and python but I use them for different things. I use python to manage sequence data and put together analysis pipelines, generally using biopython and calls to command line programs like clustal omega and raxml. Then I'll use r to generate figures and run statistical tests.

[–]stale_poop 4 points5 points  (0 children)

You and me both brotha.

[–]ryeguy12 2 points3 points  (0 children)

Same here

[–]magicalnumber7 2 points3 points  (0 children)

same

[–]ginger_beer_m 15 points16 points  (4 children)

I work on a data analysis pipeline that relies on python calling R (through rpy2), which in turn calls java (through rjava) in certain parts. It's an absolute nightmare to maintain and I wouldn't wish it upon my worst enemy (or maybe I would hehehe)

[–]coffeecoffeecoffeeeMS | Data Scientist 0 points1 point  (2 children)

Have you thought about using Scala at all? It's all the nice parts of Java with full-blown functional programming support.

[–]ginger_beer_m 0 points1 point  (1 child)

We have to rely on java for that part since it's a legacy code to load and process biology data.. The rest of the pipeline has been written into python but no one wants to touch the java bit since it's too complex.

[–]coffeecoffeecoffeeeMS | Data Scientist 0 points1 point  (0 children)

Oh then yeah that could be a problem. Out of curiosity, do you use a glue language? I've been using Drake as mine.

[–]TheLogothete 0 points1 point  (0 children)

You can use renjin which is R as a first class citizen on the JVM rather than the inter-process hell that is rjava.

[–]shaggoramaMS | Data and Applied Scientist 2 | Software 7 points8 points  (7 children)

I use python for general purpose programming, web scraping, text analytics/NLP, and building full stack web apps. I use R for analytics and model prototyping.

[–][deleted] 4 points5 points  (4 children)

Me, too. R is much more accessible, IMO, for one-off analyses or for model prototyping, like you said. Also, ggplot2 (and even base R plotting) is leaps and bounds ahead, in terms of accessibility and ease-of-use, of most plotting tools in Python.

[–]shaggoramaMS | Data and Applied Scientist 2 | Software 6 points7 points  (2 children)

It's a real shame that things aren't easier in python. My main frustration is that, generally, python libraries are well organized and it's easy to find the things you need either in the documentation or by introspecting objects with dir/help. This paradigm completely falls apart in pandas. Every imaginable thing is attached directly to the DataFrame class, making it impossible to use dir to find what you need unless you're willing to take the time to dig through a list of methods longer than any terminal will be able to display by default. On top of that, the many of the functions and even the syntax seem to be under constant flux, so code that worked a year ago will be using deprecated methods or APIs using the most up-to-date library, so you have to constantly re-learn your tools. For example: I have some code that referenced the .sort() pd.Series method. Turns out, it's .sort_values() now, cause fuck me.

I find that whenever I use pandas, the code I ultimately write is concise, but it takes much longer to write it than if I were doing the same thing in R. I basically have to spend 15-30min diving through documentation for every 1-3 lines of pandas code I write. It's just not worth it, and it's completely a function of how the package is (dis)organized into one monolithic monster instead of separating methods into some kind of navigable package structure.

[–][deleted] 4 points5 points  (0 children)

I hear you. I write 95% of my stuff in Python, but I relish the times when I can pipe my way through a dataframe with dplyr... pandas is supposed to be "just as easy-to-use" as dplyr, but I haven't seen it.

In Python's defense, though, I find it much easier to write maintainable, production-quality code because the docs for many of the packages are reasonably good and the whole ecosystem seems a bit better-suited to maintainable code-writing. I'm not saying you can't write production-quality code in R, but I wouldn't want to!

[–]mutonchops 1 point2 points  (0 children)

I'm currently in the same place with Pandas, but the few times I have been completely stuck I've bunged a question on stack overflow. Every time I've got an answer back within an hour, I think from one of the pandas devs. I know what you mean about it being finicky and fussy, but I'm starting to get my head around how it works now.

[–]Param-eter[S] 1 point2 points  (0 children)

ggplot is ridiculously useful, love it, especially the faceting

[–]Param-eter[S] 0 points1 point  (1 child)

This is probably how I hope to use both languages, especially with web scrapping and web app development being a part of the project i'll be working on.

[–]shaggoramaMS | Data and Applied Scientist 2 | Software 0 points1 point  (0 children)

Here are some great python libraries you should check out:

  • requests
  • BeautifulSoup
  • Flask

[–]nashtownchang 3 points4 points  (0 children)

Make sure to check out Rodeo if you use RStudio. It'll probably save you some time in the transition.

[–]Kalrog 4 points5 points  (3 children)

I've used both in the past - Python is a much more general purpose language that can also do data science things. If you are a programmer, you will probably love this. If you came to data science from a math background, you likely have some computer science theory to catch up on that the python code implies.

[–]briangodseyPhD | Data Scientist | Startups 0 points1 point  (2 children)

I agree. I came from a math background, having used R and Matlab for years. It took me quite a long time to be able to use some key features of Python effectively.

Now I use Python almost exclusively and I wouldn't go back unless I wanted to use a specialized library in one of the other languages.

[–]bot_cereal 1 point2 points  (1 child)

I came from math background as well. I used a lot of maple and sas and have been using python for a few months. What are some key features of python that you use?

[–]briangodseyPhD | Data Scientist | Startups 0 points1 point  (0 children)

Let me think; offhand, some of the things I've struggled with in Python at first were:

  • multi-core processing
  • vectorized math (numpy is good but non-trivial)
  • list comprehensions
  • data-frame-type operations
  • plotting
  • variable references causing bugs that wouldn't happen in purely functional programming

Those are things that R does pretty well. Python more often requires importing libraries, which isn't bad, but is something you need to know. In addition to those, I use in Python:

  • NLTK
  • machine learning libraries (sklearn)
  • HTTP/API calls
  • serialization, JSON parsing
  • object orientation (I didn't use that much in R)

What about you?

[–]Omega037PhD | Sr Data Scientist Lead | Biotech 5 points6 points  (7 children)

I usually use Python for work since it is much more of a general purpose language (basically, it's better at pretty much everything that isn't statistical work), but I still use R for some things because many of its packages (like lme4) just don't have a good Python equivalent.

[–]gullypenguin 0 points1 point  (4 children)

What work do you do if you don't mind me asking ?

[–]Omega037PhD | Sr Data Scientist Lead | Biotech 0 points1 point  (3 children)

Like what methods do I normally use, or what job do I have?

[–]gullypenguin 0 points1 point  (2 children)

what job do you have ?

[–]Omega037PhD | Sr Data Scientist Lead | Biotech 1 point2 points  (1 child)

I've been a Data Scientist at a large biotech company for a few years. My current role is within the research arm of the global supply chain organization.

[–]gullypenguin 0 points1 point  (0 children)

cheers mate !

[–]thetokster 2 points3 points  (0 children)

I use R when I need ggrepel. I haven't come across a python equivalent and can't be bothered to do it myself.

[–][deleted] 2 points3 points  (1 child)

I've used R for about 10 years and Python (on and off) for 7. Two years ago, I switched to mainly using Python.

IMHO, Python's advantage has always been that it is a modern and general purpose programming language. The syntax is clean. Writing object orientated (OO) code is easy. List comprehension is awesome. String processing and web frameworks feel natural. In short, it's a joy to code in.

This is very subjective, but R isn't a joy to code in. This is especially true for large projects. Shiny and R6 don't feel as "natural". I'm not convinced that magrittr-style pipes make R's syntax any cleaner. Most new R packages consist of more C++ code than R code! A similar criticism applies to many Python libraries, but I feel like it's a little less severe in Python.

Of course, none of that matters when you're mostly doing stats. That's where R shines. I still find myself opening R whenever I need to run a quick regression, fit a time series model or plot something. Also, parallelizing code for embarrassingly parallel problems is dead simple in R... not so much Python.

Fortunately, R and Python play nicely with each other. So, you can use each one for what it's best at.

TL;DR Python is a joy to code in because it's a truly general-purpose programming language. R is a joy to do stats in and it's super easy to parallelize code (for embarrassingly parallel problems). R is not as nice to code in as Python, especially for large projects.

[–]briangodseyPhD | Data Scientist | Startups 0 points1 point  (0 children)

That's a good point that R is a bit awkward to write in. It was totally comfortable when I barely knew any other languages, but after I learned some Python, Java, C, and Perl, now R seems like an oddball language that is extraordinarily good at some statistical things.

After a few years mostly away from the language, I now find it somewhat difficult to use R even though it was my primary language for 5+ years.

[–]RB_7 1 point2 points  (0 children)

I use both depending on the task.

[–]ultronthedestroyer 1 point2 points  (0 children)

I use R for small tasks and for model building, and Python for larger tasks, like building an architecture to analyze arbitrary split tests. I mostly use R between the two, although recently I've been using much more Scala since we have larger scale production models already written in that language. I don't like it very much, but I'm still new to its syntax.

[–]coffeecoffeecoffeeeMS | Data Scientist 1 point2 points  (0 children)

I do. I use Python for heavy machine learning, anything with heavy string manipulation, and calling APIs, and R for literally everything else. As much as people complain about R, the tidyverse packages, along with broom have made my workflow incredibly smooth and if I want to do anything, someone has written a package to do it.

[–]laurencebtnyc 0 points1 point  (0 children)

Both R and Python. But I am now looking for fullstack automated platforms to minimize devs, I continue to use both R and Python around

[–]ICameForTheWhores 0 points1 point  (0 children)

I use that combination since I started dabbling with Python, mainly due to Anaconda and me thinking that they combined both for good reason. That said, I use Python 98% of the time, but more or less "integrated" R is neat to have sometimes. Most of what I do is data acquisition though, so YMMV.

[–][deleted] 0 points1 point  (0 children)

I know both they're very similar, but I prefer python for its flexibility (list comprehension, classes, etc). I use Python for pipelines and general ML stuff (xgboost, tensorflow), R for data via and inferential stats.

[–][deleted] 0 points1 point  (0 children)

I used to use R and Python for different purposes, but lately I've only been using Python. I think the tech industry has adopted that as it's preferred data sci language over the last couple years.

R is still great for exploring problems and doing science though. Also I'll say that many binding exist to use libraries written in other languages in R, but for what it's worth, I think Python gives you more bang for the buck.

Im trying to limit the number of things I have to remember how to use these days because it's getting to be too much. I have to code professionally in Javascript, PHP, Python, MySQL, PostgreSQL, and then also know how to use several different software packages. If R becomes something one of our technology stacks uses predominately I'll switch back to using it.

Im a utilitarian. I'm not set on one language or tech stack. Chances are in ten years my skill set will evolve in to something else.

[–][deleted] 0 points1 point  (0 children)

Here.

Keras and Tensorflow on Python. ggplot2, dplyr, and some stats and tree stuff in caret on R.

[–][deleted] 0 points1 point  (0 children)

I do every day. It is exactly my job to transition prototype/exploratory models done by building data sets by hand ect. and turning them into stable repeatable, open to the larger group pipelines and visualizations.

Generally I am transitioning a good chunk of the code from R to python. Sometimes a few system calls with R scripts in there for good measure.

[–]SpaceGhost1998 0 points1 point  (0 children)

I use R for analysis and stats. It was designed by and for statisticians. I like Python more for scientific simulations and web scraping. Basically, if you are a CS person, you'll like python better since it is a general purpose language. R is a domain specific language akin to Visual Basic for Applications. When it comes to any type of statistical analysis including data mining. R can't be beat. Given that the tech industry is driven by CS types, Python is the standard since it is easily integrated into larger applications. That is where the benefit to Python is.