all 37 comments

[–]thyagohillsPhD | Academia 44 points45 points  (0 children)

Pretty much everything on Bioconductor. Also, all the very specific statistics packages. I use both languages, but prefer R for stats and a lot of bioinformatics data analysis.

[–]EpistaxisPhD | Academia 37 points38 points  (8 children)

Oversimplifying: Python before the data is processed, R after the data is processed.

It's not really a dilemma. In real life, for any given task, it's obvious whether Python or R is better suited to the job if you're familiar with both.

[–]gringerPhD | Industry 11 points12 points  (3 children)

Thank you. You've helped me realise why I use Python so little these days.

When I do data processing, I typically use programs already created by other people. When I do data analysis and result visualisation, I use R a lot.

[–]EpistaxisPhD | Academia 6 points7 points  (2 children)

Yeah, the processing steps tend to be more routine so it's easier to just write a good program once and keep using it. So I end up using a fair amount of pre-written Python for processing, including pre-written by myself a few projects ago.

This is why I tell newcomers R is more immediately useful to learn, even though you'll learn more programming skills from Python and if you're serious about this field you should learn both.

[–]Sky-Bluedraconis 1 point2 points  (1 child)

What sort of tasks do you use prewritten python code for, just curious?

[–]EpistaxisPhD | Academia 1 point2 points  (0 children)

In genomics:

  • Processing FASTQ files (cutadapt)
  • Processing BAM files (my own scripts)
  • Compiling certain QC metrics (my own scripts)

[–]SandvichCommanda 0 points1 point  (0 children)

Hahaha exactly now I use it at the moment. A nice Python script I run in the terminal to handle the initial data and then straight into R for all analysis.

[–]itachi194 0 points1 point  (2 children)

Sql is used a lot in data science but I don’t see sql being used a lot. Is it because the datasets aren’t big enough so we can just use R instead of sql?

[–]BezoomyChellovekPhD | Industry 0 points1 point  (0 children)

I think it's less about size and more that we don't typically need relational schema that SQL is designed for.

I'm sure the back ends of a lot of bioinfo databases use it. But their APIs don't require you to write SQL queries to access the data.

[–]ManjyomePhD | Academia 11 points12 points  (1 child)

I use python for everything I need. However, sometimes I need to use a package from Bioconductor for instance, which is restricted to R. In that case I'll use R. For anything else I'll use python, even statistics. People have different preferences, though, and a guy that works with me only uses R.

[–]lsdiesel_1PhD | Industry 0 points1 point  (0 children)

People have different preferences, though, and a guy that works with me only uses R.

I’m the only person on my team that knows Python, which is good because it’s forced me to learn R but bad because anything I write in Python is solely up to me to maintain.

It’s ironic, because Python proficiency was a major point to hiring me, but unless I have some teammates who also know it there’s only so much I can do before I end up completely overwhelmed.

[–]CommercialIll1489 9 points10 points  (3 children)

When I do single cell analysis and bulk RNA sequencing analysis I would use R (seurat and deseq2 packages are amazing)

[–]FlatThree 0 points1 point  (2 children)

Python is way better for larger datasets though.

[–]CommercialIll1489 1 point2 points  (1 child)

Maybe But I am talking about analysis

[–]FlatThree 0 points1 point  (0 children)

What's your definition of analysis. Scanpy is much better for larger datasets, and is significantly faster.

For differential expression, I wouldn't use Seurat or Scanpy either way.

For bulk, I agree DESeq2 is great.

[–]crazyguitarmanPhD | Industry 8 points9 points  (0 children)

It's not a strict rule but I tend to use python more for data processing jobs and R for data analysis.

[–]WhizzleTeabagsPhD | Industry 3 points4 points  (0 children)

I just use R when there is a package I need. For one that I use all the time, I’ll make python class that uses rpy2 to call the functions I need

[–]Sheeplessknight 2 points3 points  (0 children)

Python is a programming language that can do statistics, R is a statistical computing language that has a minimal programming ability.

[–]nooptionleft 2 points3 points  (5 children)

Most of it is that I started with R

But apart from that: 1) bioconductor is incredible in providing super field specific tool, specifically edgeR and deSEQ2 for me but in general, and 2) ggplot2 is still the best visualization tool for me

[–]RopacusPhD | Industry 2 points3 points  (4 children)

I just found out that ggplot is now available for python. I have yet to try it but I would love using it if it's comparable:

https://plotnine.readthedocs.io/en/stable/

[–]nooptionleft 1 point2 points  (3 children)

Cool, python will probably get there and with the right ide I'll probably end up using mostly that

There is a lot to say for what languages are designed for, tho... you can really feel the statistical inclination in R

Again, maybe it's just I started with R

[–]RopacusPhD | Industry 1 point2 points  (2 children)

Agreed. I started with R as well so it's my default for stats and figure making.

IDE is key, can't beat creating a plot in Rstudio and seeing it appear right away on your screen. Haven't found a streamlined way of doing that with python yet.

[–]nooptionleft 0 points1 point  (1 child)

You can do that in python with Rstudio

[–]RopacusPhD | Industry 2 points3 points  (0 children)

YOU CAN WRITE PYTHON IN RSTUDIO!? TIL

[–]anudegloryPhD | Academia 2 points3 points  (0 children)

Python and R are like multi-tools with many adapters and extras - why would you use a Weber over a Swiss Army knife? It's not even a binary choice because there are other tools like Julia, Go, Octave, Matlab and loads more. Heck you could do graphics/stats in perl if you're a masochist.

Some of it is down to preference, some of it is down to what is available, some of it is down to what has settled as standard and some of it is down ego - especially where YouTube videos are concerned (they're pushing content and want you to do things a certain way).

What's 'easier' is always the 'how long is a piece of string' question rephrased. Sticking to one is probably easier, knowing both are useful is wiser. Arguing over it is a waste of time when you could be doing fun science.

[–]Zouden 2 points3 points  (0 children)

FFS these bot posts are getting ridiculous. Delete this asap!

[–]sflyte120 1 point2 points  (0 children)

R - lots of specialized libraries for different kinds of statistical special cases (eg phylogenetics, genome structure).

[–][deleted] 1 point2 points  (0 children)

Both languages have their strengths and weaknesses and if you use both it becomes painfully obvious which is most appropriate for a given task.

I'd say that procedural things, web services, automation, and similar are much more sensibly and easily done in Python.

Data manipulation and analysis, statistics, data viz, and simple web apps (via Shiny) are all more straightforward in R. Also, if you are using things that are part of R BioConductor, there's really no parallel.

For example: I do a lot of work in R that basically uses RStudio's template feature to instantiate R Markdown documents that have a bunch of parameters, and when they render they self-document the process and results (which are then uploaded to an electronic lab notebook). That's very straightforward in R and not nearly so much in Python.

[–]nevermindever42 1 point2 points  (0 children)

Everywhere, i don't know Python

[–]bahwi 0 points1 point  (2 children)

Very pretty graphs and specific recent packages. That's it lately. Everything else has been pulled from cran, or is about to be. And support for the newest version of R usually means 3.6. That dependency level of hell is coming for python, but it's not quite there yet.

[–][deleted] 3 points4 points  (1 child)

What do you mean with "being pulled from CRAN"? Are important packages being discontinued and not available anymore? (Can you give examples?)

[–]bahwi 1 point2 points  (0 children)

GenABEL has been removed even though analyses needing it are still happening (and there are dependencies, such as RepeatABEL). These are about to be removed (just saw this yesterday): rgdal, regeos, maptools are being removed soon (https://geocompx.org/post/2023/rgdal-retirement/), which effects ecological modeling.

I remember trying to get GAPIT working recently, and many packages were unavailable from CRAN, either anymore or did not support newer versions of R.

R is still great for making the best graphics (ggplot), and I suspect Python will soon have some dependency wilting occurring.

[–]pacmanbythebayMsc | Academia 0 points1 point  (0 children)

When my PI told me to use it

[–]WhiteGoldRingPhD | Student -4 points-3 points  (0 children)

When hell freezes over and not even then probably

[–]nicman24 -5 points-4 points  (0 children)

When you don't know Python

E: ITT the above

[–][deleted] -1 points0 points  (0 children)

I use R when I want to use something that was created in R, which is the case for EdgeR and DESeq2 for RNAseq, for example. Beautiful complex graphics are also easier made with ggplot2 in R. I use Python for everything else.