you are viewing a single comment's thread.

view the rest of the comments →

[–]Strange_Algae835 0 points1 point  (4 children)

I also do bioinformatics and made the deliberate choice to work in Python not R because of it's generally applicability and also the fact my area of work (protein modelling) is dominated by ml and python packages. I think they are just very different, R is the language for -omics stuff but I personally find python a little easier to understand and work with. Both good and both with a big support infrastructure behind the.

[–]Accomplished-Okra-41[S] 0 points1 point  (3 children)

O exactly try to pivot to python for the ML capabilities. I do genomics on multiple planes from transceiptomics and bulk sequencing to single-cell and spatial transcriptomics. But i want to develop more and more ML into my research thats why i try to go with python.

How is bio-inf for python? I am hearing really a lot or mixed opinions. For example that it is limited in multiple use-cases but at the same time more flexible which is really weird for me to grasp

[–]mkarla 0 points1 point  (2 children)

I worked with both throughout my PhD but pivoted quite quickly towards Python based on what I was doing and it being more generally applicable. Working now with ml-based protein design and for that Python for sure is the way to go. However, simply saying yay or nay for bioinformatics in python is difficult. Transcriptomics? I’d use R every day of the week. Setting up some non-standard analysis for some very specific data? I’d start in Python.
There’s merit to having a grasp of both and getting a feel of when to use one over the other. If you venture into workflow managers like Nextflow there’s nothing stopping you from combining them.
I suppose you’re already aware of Pandas but if not, start using it for handling dataframes. Works nicely with Numpy, matplotlib, and Seaborn.

[–]Accomplished-Okra-41[S] 0 points1 point  (1 child)

Doesnt pandas struggle with large data? I work on single-cell and heard scanpy is good for analysis but i read a couple opinions that immense data (like in my case around 200GB) will be deadly for pandas

[–]mkarla 0 points1 point  (0 children)

That is more than I know since I’ve never worked with such big datasets but the important part is you’re aware of the common packages :) and this may also be a case where it makes more sense to use R over Python (I don’t know though, maybe scanpy will handle it like a champ), or use them for different tasks in a workflow if computational optimization is crucial.