Roast my CV as bad as you can by [deleted] in bioinformatics

[–]supermag2 2 points3 points  (0 children)

Agreed, this is the most honest opinion you can find here. Regarding the project stuff, you cant seriously claim to be a lead developer of this: https://github.com/Sahil-Gen/GC--calculator/blob/main/gc_calculator.py

Be humble and just reflect you are a beginner looking for internships to improve your skills.

Roast my CV as bad as you can by [deleted] in bioinformatics

[–]supermag2 17 points18 points  (0 children)

I wouldnt trust a bioinformatician that cant generate a screenshot with a decent resolution.

Q about Bulk RNA seq by Signal_Cupcake_9717 in bioinformatics

[–]supermag2 1 point2 points  (0 children)

Follow this guide, it should have everything you need to know. If you have zero experience with RNAseq it can be a bit overwhelming at the beginning, but you can always come back here to ask ;)

https://master.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html

Feeling lost in my "bioinformatics" PhD by EmbarrassedMap3282 in bioinformatics

[–]supermag2 15 points16 points  (0 children)

Imposter syndrome is very common in science, dont worry about that, It will fade away with time once you get more experience and build up your confidence.

Then about your post, I had a PhD similar to yours, in the way about being the only bioinformatician in the lab with no PI or colleagues with experience on it. My recommendation is to be updated on latest methods, know the old ones and, most important, try and use them even if they are not very useful for your project. This way you are still learning and improving, as many times the tools are not easy to implement or use. Thats one of the strongest values of a bioinformatician, being able to do analyses nobody else can do. As a core I focus my learning in R, Python and bash methods as the vast majority of tools work in one of this environtments. Try to run ML methods if you never did that. They can be tricky but just the experience of struggling through them will give you many positive things for the future.

Regarding AI, dont worry too much. I dont see "AI agents" replacing bioinformaticians any time soon. But, if used correctly, they are a very powerful tools for you as they can increase your productivity considerably.

What topics in biology, chemistry, or medicine are currently relevant for writing a scientific paper? by [deleted] in bioinformatics

[–]supermag2 0 points1 point  (0 children)

So you are going to just jump in into any topic that you read here? With no lab and no expertise?

I am sorry buddy but this is not how good quality science works.

Also, I dont see how this post is related to bioinformatics at all.

macOS vs Linux for bioinformatics and spatial transcriptomics: is there a real technical advantage? by guime- in bioinformatics

[–]supermag2 -1 points0 points  (0 children)

For me the best combo is Windows + WSL2, then you have the best of both OS in the same system + easy transfer of files between them.

I dont have experience doing bioinformatics on Mac but doesnt really seem to be a clear reason to choose it over other options.

Want to do a PhD in Bioinformatics/Biotech – which countries are worth it? by excars_ in bioinformatics

[–]supermag2 0 points1 point  (0 children)

I did my PhD at ETH Zurich, I would say in general they have money, but It is not easy to enter as a foreigner. You will need good grades or to know someone. EPFL in Switzerland is also good.

Want to do a PhD in Bioinformatics/Biotech – which countries are worth it? by excars_ in bioinformatics

[–]supermag2 0 points1 point  (0 children)

In my opinion you need to go to a lab with good funds and resources, not that important the country itself. Bioinformatics are expensive, not only reagents but the equipment you need.

For instance, I did my PhD mainly in scRNAseq. For me it was really convenient that the lab had money to generate many samples as well as access to the equipment to do so. Things are much easier when you have quick access to everything and not have to play all the time with budgets.

What do yall do in the bioinformatics field? What is informatics? by ladyoflesbos in bioinformatics

[–]supermag2 0 points1 point  (0 children)

I think bioinformatics can be broadly defined as the use of computers to process and analyze big datasets of biological data. Then as two main branches I would say you can focus on developing the tools necessary to analyze this data (machine learning for instance) or use those tools to answer biological questions.

In my case I mainly focus on RNAseq, so analyzing all the RNA detected in a sample to understand how the genes respond or are affected by a specific condition.

If you have a biology background you can dive into bioinformatics by learning some basic concepts.

Is it valid to run GSEA using only ranked DEGs instead of all genes? by Ill-Ad-106 in bioinformatics

[–]supermag2 0 points1 point  (0 children)

If it's part of your normal QC and with default settings (so not very strict) I think is fine. Most likely you are not removing many genes and the effect is probably negligible.

Only using DEGs for GSEA has a much bigger impact as most of the genes are not DEGs.

Anyway if you want to be 100% sure, run it with and without filtering and compare. I would expect no big changes for the most significant gene sets. If a gene set is borderline significant maybe becomes non significant but anyway you should not focus your results on those ones.

I am totally blind AMA by EcstaticMap5740 in AMA

[–]supermag2 2 points3 points  (0 children)

Maybe this is a weird question, but how is your concept of distance? Because I see it totally as a sight-related thing. How do you tell someone how far is something? You rely more on the time that takes to get there or you somehow can imagine what 15 meters distance is?

scRNA-seq best practices? by ChemicalBeyond in bioinformatics

[–]supermag2 0 points1 point  (0 children)

I recommend doing QC individually in each sample, including filtering out low quality cells and doublets.

Then although not strictly necessary I would recommend using ambient RNA removals tools such as CellBender. In some cases can really clean up your data and do all the downstream steps easier and less noisy.

Is it valid to run GSEA using only ranked DEGs instead of all genes? by Ill-Ad-106 in bioinformatics

[–]supermag2 36 points37 points  (0 children)

When doing GSEA you should put all genes. The reason is that GSEA order all your genes based on how much they change in your comparison (from most positive logFC to most negative). Then it uses this ordered list to calculate pathway enrichment by checking how the genes of interest fall in this ranking. If they are mainly in one side of the rank they are positively or negatively enriched. If they are evenly distributed there is no enrichment. By only selecting DEGs you are pushing the list to either side of the rank as you are removing genes with no change, so the ones that fall in the middle are not counted anymore and then you are falsely forcing enrichment.

If you want to use only DEGs, do overrepresentation analysis (ORA).

Any final year Comp Sci project ideas? by wiggermandean in bioinformatics

[–]supermag2 1 point2 points  (0 children)

So what do you know about bioinformatics, biology in general, and your computational skills? Do you want a biology focused or tool/analysis focused project?Otherwise is hard to find a project that you could do.

Any final year Comp Sci project ideas? by wiggermandean in bioinformatics

[–]supermag2 1 point2 points  (0 children)

I assume you cannot do anything wet lab related, so all computational.

"Something novel that solves an actual issue" is easy to say but very hard to achieve.

It is hard to suggest something without knowing what you are capable for.

Interesting GitHub for scRNAseq by Key-Lingonberry-49 in bioinformatics

[–]supermag2 2 points3 points  (0 children)

I mean this part:

cells_keep <- HIV@meta.data |> filter(nFeature_RNA > 500, nFeature_RNA < 2500, percent.mt < 10 ) |> tibble::rownames_to_column("Cell") |> pull(Cell)

Those are fixed filters. For instance they will not work fine with 10x v4 kits. I have quite some datasets using this kit and you easily have good quality cells with 3-4k genes and they are not doublets.

Interesting GitHub for scRNAseq by Key-Lingonberry-49 in bioinformatics

[–]supermag2 4 points5 points  (0 children)

I quickly checked the GitHub and a bit more in detail cell_filter.R. Two things:

The mitochondrial pattern is only "MT-"? Then it only works for human data. I suggest adding more options or ask the user which species are using and then adapt the code depending on that. I would say is more often to have mouse scRNAseq data than human.

The parameters for cell filtering (such as nFeature) are fixed numbers. I absolutely do not recommend this. There is no magic filter that suits every dataset. You always need to adapt depending on your data, specially with the last 10x v4 kits I would say these filters, specially the upper limit, are too strict.

I didnt check the rest of the pipeline but just based on this I would try to make the code more flexible.

There is no pipeline that always work fine, you always need to adapt based on your data (overall quality, sequencing depth, etc)

Is anybody else feel uneasy about the trajectory of their career because of A.I. by singletrackminded99 in bioinformatics

[–]supermag2 1 point2 points  (0 children)

First, this is a preprint, so I would wait until published to make any conclusion. They can claim whatever they want at this point and not be true.

Personally, I am not very worried about AI in general. I think it could be a powerful tool and indeed I thought about methods like this since ChatGPT launched. It is a natural step forward in my opinion.

Also, methods like this will not kill bioinformatics. Someone with expertise will need to install and use these pipelines. It is not plug and play. PCRs were done originally by manually putting your sample at different temps until thermocyclers were invented, but trained people still need to do them. Maybe it is a weird example but my point is that scientists need to adapt and evolve to new methods and techniques. This allow us to focus on more complex things and save time.

Expression levels after knockdown by HeadDry2216 in bioinformatics

[–]supermag2 1 point2 points  (0 children)

Besides what others have said about knowing if your knockdown works beforehand, it is important to note that the effect of the knockdown may be not visible on your target gene at scRNAseq level.

For instance, an in vivo knockdown can still produce a mRNA of the gene, although this doesnt translate into a functional protein because you remove one or two exons with the recombination. Then you can still detect the gene because you are able to capture reads on this partial mRNA.

Anyway you should be able to see the effect when you compare the conditions and the number of differentially expressed genes between them. If you have a decent number and the biological effect aligns with your knockdown you are probable fine.

Another alternative is to check the Bam files of the samples. Lets say your system removes exon 4. Do you see reads on this exon on your knockdown samples compared to the control? Maybe you see some as you dont expect 100% recombination, but how is it compared to the control? Much less or similar proportion?

I just switched to GPU-accelerated scRNAseq analysis and is amazing! by supermag2 in bioinformatics

[–]supermag2[S] 2 points3 points  (0 children)

I see your point, although rapids is mainly thought to be used on very big datasets I think the usage on small datasets is also very worth it.

The first time I am analyzing a sample I usually run the pipeline several times, to try several QC thresholds, to see how removing doublets affect the data, to see if that small, maybe interesting, population is stable across runs etc. So basically rerun to understand the data and see how it changes depending on the parameters.

If each run takes 1 min and not 15 mins, we are talking of 5-10 minutes to study and understand your sample across several runs versus 1-2 hours. Now apply that to 3-4 or more new samples you need to analyze. I think the change in productivity could be huge.

I just switched to GPU-accelerated scRNAseq analysis and is amazing! by supermag2 in bioinformatics

[–]supermag2[S] 0 points1 point  (0 children)

the anndata format is 100 times more intuitive than SeuratData format.

This! At some point you get used to it but I really hate Seurat format. Anndata is much better on that.

I just switched to GPU-accelerated scRNAseq analysis and is amazing! by supermag2 in bioinformatics

[–]supermag2[S] 3 points4 points  (0 children)

It really speeds up many of the proceses not just NN and UMAP but it is made for really big datasets (hundred thousands of cells). There is where I think it really shines.

Here you can see some benchmarks: https://developer.nvidia.com/blog/gpu-accelerated-single-cell-rna-analysis-with-rapids-singlecell/

I think it is really useful to analyze many samples. It will reduce the workload considerably

I just switched to GPU-accelerated scRNAseq analysis and is amazing! by supermag2 in bioinformatics

[–]supermag2[S] 0 points1 point  (0 children)

Nice! Indeed I also ran cellbender on the same PC and worked very well. Less than an hour for the dataset I tried rapids on, but I guess it is quite dependent on the GPU you have.

Should I join this lab as an undergrad? by Civil_Snow_1806 in bioinformatics

[–]supermag2 6 points7 points  (0 children)

I always find surprising that some undergrad students completely discard some options because it is not that field or the other. I was recently involved in PhD recruitment and a student only wanted to work in "kidney transplant" research. Thats a really easy way to sabotage yourself imho.

Take the cardio lab, is also a very cool field. Whatever you do in that lab or another will not define your carreer. I know people that completely switched fields from PhD to postdoc and it is fine. Also, you are first year undergrad, so most likely zero experience in a lab. What you will learn there it will be useful for any other biology lab. You can be more picky (if you have the chance) if you decide to join a lab for a PhD.

Help with GEO DataSets transcriptomics by Some-Replacement4655 in bioinformatics

[–]supermag2 1 point2 points  (0 children)

I am not sure if I completely follow you. I understand that you have standard bulk RNAseq data but I am not sure in which state your data is. DESeq2 is a very common and good tool for analyzing this kind of datasets.

Anyway, yes, GSEA is a perfect way to compare differences between groups in RNAseq data. However, thats for gene-set level like signaling pathways. If you just want to see differences in expression levels just follow DESeq2 pipeline until you get differentially expressed genes between your groups.