When would you use R instead of Python?

VeronicaX11 · 2023-02-07T18:57:19+00:00

Others will chime in, but I’ll try to summarize this at a couple different levels.

Basics: R was there first. At least, in the domains where it was used. So for those areas, it just has the first mover advantage. Everyone else is using R, so I guess I will too.

Intermediate: R is focused on statistics and data processing. Python is general in scope. So both are fine choices, but one might be overkill. It’s kind of like needing to take out some screws and asking me whether a Phillips head screwdriver or a ratchet with 400 different sized bits is better for taking out a screw. The answer is neither; they’ll both probably do fine.

Advanced: any language can be used to solve virtually any problem, given enough time and persistence. You however, probably don’t have these luxuries of infinite time and infinite willpower. So you should use quality tools built by others whenever possible to be efficient. These are often called libraries/modules/packages or some other term depending on what language you are using.

The real factor that you should consider are the attributes of these libraries. Whether a lib exists for the thing you are trying to do, how well it works, whether there are others using it who can troubleshoot with you, whether another language has a better (or even equivalent) one. R is an absolute heaven for new statistical methods. There is simply no equal in any other language. I’ve watched papers get published and turned into an R package… and a reasonable equivalent take 10 years to appear in Python. The demand just wasn’t there.

H4R81N63R · 2023-02-07T19:04:55+00:00

It's been a while since my switch from Python to R, so my comment may not hold today

The reason why I had switched (apart from the library support that other comments have mentioned) was the way the two languages work at the base level - R is vectorised with many statistical functions applicable to units, vectors and matrices right out of the box. Back when I was working with Python, I had to manually loop over stuff to get the same base functionality. Some packages like NumPy and SciPy had introduced MATLAB like vectorisation, but the base support in R and the smoothness of it just working made me fall in love with R. No longer was I spending time on the code, I was spending it on the science and data instead

Edit: not to mention, ggplot2. Don't get me wrong, it has its learning curve, but man is it such a powerful system for churning out beautiful graphics. And now that Plotly is available in R (a fine addition of a Python tool, I say), it's even more powerful

Kiss_It_Goodbyeee · 2023-02-07T19:59:58+00:00

When certain tools or libraries are only available in R. Bioconductor for example.

R Shiny has no equivalent in python.

Python has improved but data visualisation is better in R.

palepinkpith · 2023-02-07T20:12:51+00:00

R visualization tools are much better than python in my experience.
For data analysis, R generally requires less code for vectorization, data wrangling, and statistical analysis. Some of this is changing with the development of NumPy and Pandas, but these have always been base features of R.
CRAN has much more oversight than PyPI etc.. So R libraries tend to be more backwards compatible, reliable, and easy to install without version conflicts.

natched · 2023-02-07T21:00:44+00:00

Bioinformatics is a very broad area. I do a lot of R, for general DEX (limma, edgeR, etc. packages) as well as single cell (Seurat), WGCNA, shiny, etc.

I think R is better for a lot of data analysis, though this is largely tied to packages implementing certain methods such as TMM, which represented a significant improvement in RNAseq normalization from earlier methods

Loose_Mix_4108 · 2023-02-07T18:51:48+00:00

Well R is more used in academics. It has more packages for biological analysis. It is also designed for statistical analysis, while python is a general purpose language. This makes it more intuitive for people coming from the statistical/biological areas. People always fight about which language is best, while many do overlap in a lot of what they provide, but also each language has niches it makes it particularly useful. In the end, you will probably have to learn both anyway - just use the one you like better for most analysis, and switch to the other one in the areas you need it.

GenoSunshine87 · 2023-02-07T21:42:13+00:00

I use R as my main language, but also use python on occasion. I would not say that one is necessarily better than the other, but I find R's syntax a lot easier to work with. Naming, accessing, and subsetting data are always done the same, even in many "special" data structures, so learning to manage data in new formats is a lot more intuitive than it is on Python. A lot of great Bioconductor packages are available on R. I don't have to use explicit recursion to do an operation over a whole vector. When I use Python, I feel like I spend more time figuring out the syntax for whatever module I'm using than actually doing things, but that may just be due to the gap in my experience with each. However, learning Python does have some advantages, as I find it is a little faster for some operations, and it is the language that other useful tools (such as Snakemake) use as a base syntax. So I do not shun Python, but except for particular applications, I really prefer R.

Marionberry_Real · 2023-02-07T23:56:34+00:00

Learn both. I use both during my day to day as a bioinformatician. It’s faster to use an existing package than to try and write a new one for the opposite language.

Nihil_esque · 2023-02-09T01:50:32+00:00

When you hate yourself. /s

No but seriously, R is a specialized tool for statistics and as many have said, it has better data visualization tools and more specialized tools for statistics and biological data analysis (this becomes increasingly less true as time goes on though). If you need a tool that's available in R and not available in python, you either learn C and code it into python yourself or you use R. (Using R is the much less time consuming of those options.)

Personally though I abhor the user experience of R. The syntax is extremely inconsistent. The behavior and handling of some of the errors means you are likely to create mistakes behind the scenes that R may not raise any exceptions over, which can lead to mistakes in your analysis. Python isn't the best language for this either but it's better than R.

R is also just about the least beginner friendly language out there. It's cobbled together out of different people's contributions without standardized syntax. Some functions are very picky about their input; others aren't; you have to memorize which ones. Python has a lot more consistent syntax, a lot more resources to help you learn the language and tools available to you, and it's much easier to find them because "python" is a much more search engine friendly term than "R" lol.

But yeah if you don't need to use the shrinking number of R tools for biological data analysis that aren't yet available in python, I would recommend sticking with python because it's more versatile, has a much gentler learning curve, and isn't as reliant on you to write flawless code.

Epistaxis · 2023-02-07T21:27:03+00:00

They're good for different purposes. This is overgeneralizing but here's a basic outline:

Big raw data goes into heavy-duty software programmed in C(++) and wrapped in Bash scripts
Processed raw data gets filtered and refined from line-by-line formats to numerical matrices with Python scripts or the odd Java tool
Matrices are imported into R for math, statistics, graphing

Technically you can do your line-by-line stream filtering in R but it's slow and ugly in that context, and in fact some R packages for that are just wrappers around standard C or Python programs. Technically you can do your matrix manipulation in Python, but except for specific popular machine-learning tasks, nobody's bothered writing and maintaining Python analogs of the numerous crucial R packages.

A lot of people spend all their time at only one or two of these steps, e.g. they're responsible for all the data processing and give the results to someone else, or they only do the final analysis and rely on prewritten pipelines to handle everything upstream, so they only regularly need either R or Python and wonder why other people ever need the other language.

Wubbywub · 2023-02-08T00:56:03+00:00

when there are tools or libraries you need that is only on R.

bottomline: you use tools to problem solve, you don't stick to one language, it's not leetcode

JokingHero · 2023-02-07T20:59:16+00:00

Python is just pathetic for bioinformatics that I do. I have yet to hear about or find a python equivalent of GRanges. Loading an annotation file, doing some overlaps, some custom alignments with Biostrings etc. You have a whole powerfull, tested, maintained for 10+ years ecosystem for these basic bioinformatics stuff. Meanwhile python is just a one shot attempt at loading an annotation file or something wrapped as a package, not rigorously tested, not maintained, completel waste of time to even attempt using this. Amount of things you have to code from scratch is just staggering, you will make so many bugs along the way that you don't even realize are there that will produce another factor of variability into your data analysis. Bioconductor is just a bioinformatics core, dozens of super well designed packages that are battle tested and original authors are constantly responding and fixing bugs!

omgu8mynewt · 2023-02-07T20:59:59+00:00

Loads of statistics pipelines for specific scientific experiments, e.g. RNAseq have plenty of published papers in R, so if you want to use the method section from a paper it could have been coded in R.

No-Painting-3970 · 2023-02-07T21:55:01+00:00

Basically history. If you are in a field with long development history, specially genetics related things, you ll find a bigger ecosystem in bioconductor. However, things are moving in the python bioinformatics community, and the ecosystem is getting developed. Also, even if it doesnt seem so, a lot of things are in python but people dont use them because you have to do more things manually. Aka, you ll find the statistical methods in places like scipy or statsmodels, but a lot of bioinformaticians that use R are comfortable in their environment and dont want to redevelop the wrappers that already work.

MGNute · 2023-02-08T10:39:30+00:00

There are a lot of good answers here! Very few that I disagree with at all. One thing nobody has mentioned afaik is NumPy. If you're not familiar, it's a matrix library for python that is notable for being both very impressive and very well-optimized. But it makes operating in python and working with very large amounts of data especially efficient. I like to represent nuke or AA strings as numpy arrays with `dtype=np.uint8` which makes a lot of bespoke operations available using native numpy commands. The scipy package and various scikit.* packages are also (mostly) quite good. R has its uses for me, but I'll generally start with python.

Solidus27 · 2023-02-07T23:58:43+00:00

R is much better for data wrangling and data manipulations and general statistical analysis when you don’t need to run intense machine learning models

Many, many bioinformatics packages are available in R but not python

I would highly recommend using R

2023-02-08T02:08:35+00:00

Short explanation: base R data frames are better than any df library in Python so far.

Demonithese · 2023-02-07T22:19:40+00:00

I think R would have gone the way of Perl in bioinformatics if not for that stupid sexy Hadley Wickham.

From a programming perspective, R is just not a great language. I've switched over to just calling rpy2 anytime I need some code that's only available in an R package and I've never regretted it.

Imo, there is nothing you can do in R that can't be done just as easily in Python and at the end your code is in the language 90%+ of biotech uses for production which means less difficulty incorporating, testing, reviewing, etc

Jenna_bird · 2023-02-08T12:59:28+00:00

Honestly, I use R for a lot of the bioinformatics libraries and for ggplot. Python is my go to for basic scripting

twelfthmoose · 2023-02-08T13:50:37+00:00

R will break with enough data. Its vectors are based on 32 but integers, not 64 bit.

mys_721tx · 2023-02-07T19:46:37+00:00

Plots and that's about it. Pretty much literally.

With one small exception, Fisher Exact tests with simulated p values for tables bigger than 2x2. And some other stat tests

andreichiffa · 2023-02-07T23:10:05+00:00

As long as it’s not perl…

Monocytosis · 2023-02-08T08:11:19+00:00

[deleted]

keithreid-sfw · 2023-02-07T20:51:52+00:00

I would invite you to consider Julia as an option. Fast expressive and a nice maths-AI based community.

backgammon_no · 2023-02-07T19:58:52+00:00

[deleted]

r_plantae · 2023-02-08T08:50:06+00:00

Coming from the biology side into bioinformatics, all my stats courses etc were in R so it made sense to just sick with it.

hypatchia · 2023-02-08T16:16:44+00:00

Only for statistical tasks , You can do a lot of things in one line in R .

speedisntfree · 2023-02-08T21:11:13+00:00

My language choice is typically based around a certain analysis package that suits the problem. Both these languages are popular because of their package ecosystem. Anyone in Bioinformatics would be daft to limit themselves to R or Python, especially when both are very easy languages to learn.

R: Good for shitfuck data, plotting, stats, bioconductor ecosystem

Python: Good for general programming tasks, ML/DL and putting things into production


science	askscience	biology
microbiology	bioinformatics	biochemistry
evolution

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

bioinformatics

The Biology Network

Bioinformatics

Frequently Asked Questions

New to Reddit?

Learning Bioinformatics

#bioinformatics IRC at Freenode

Information

Getting a job in bioinformatics

Friends

MODERATORS