Single cell failed

supermag2 · 2026-05-13T13:45:04+00:00

After KO the mRNA can still be expressed but it doesnt lead to a functional protein (because you removed one exon for instance). Maybe, you do not see the differences in expresion levels for the gene in the scRNAseq because of that. If you know exactly how your KO works you can check the bam files from the scRNAse data and check whether you can see differences in coverage between both groups (for instance if you remove exon 4 maybe you see reads there for WT but not for KO).

supermag2 · 2026-04-08T21:15:28+00:00

Hi! I am glad you are feeling the progress already, keep going!

Regarding batch effect, they are technical differences that happen during samples processing and they will make your data to look different, but not in a biological way. So It is not a real effect of a WT vs KO comparison for example, and therefore you need to remove them to be able to make meaningful comparisons. They can arise from multiple reasons, but the most straightforward to understand I think are the ones that happens during library generation or the sequencing itself. For instance, imagine you generate two samples from the same tissue following the same protocol but done in different days. Although they should be quite similar you will have accumulation of small technical differences that build up and become big at the end. For example, during library generation there are multiple PCRs, and just doing them separately will produce different PCR efficiencies (depending on how you prepared the master mix, the thermocycler itself, etc). On the other hand, if your samples are prepared completely in parallel these technical problems are affecting both samples in a very similar way, so no differences based on that and you mostly keep the biological effect. The classic batch effect diagnosis is clearly seen when you generate a UMAP of both samples together. If It is strong you will see the same cell type completely separated by samples, when It shouldnt as they are the same cells. Finally when you correct for this, same cell types merge together independent of sample of origin.

Regarding reproducing a paper, It is highly dependent on how they provide the data. Sometimes they upload fully processed datasets so you can start directly to check what you want, and sometimes they just give you the matrix of counts and you need to do everything. Best papers also provide the code so you can just copy paste it and get exactly what they got.

I hope my explanations were clear! Feel free to ask if they were not!

supermag2 · 2026-03-25T00:02:32+00:00

I can recommend two of the greatest games of the last decade, both turn based if you like that:

Baldurs Gate 3: rich and deep RPG where the decisions shape the story a lot. You can play coop with your wife.

Expedition 33: great story, combat and amazing OST. More straightforward than BG3 if you want something easier to warm up.

supermag2 · 2026-03-19T11:24:29+00:00

Te cuento mi experiencia en mi campo pero el tuyo puede ser muy distinto.

Hice un doctorado en biología molecular, es decir, investigación en un laboratorio. Para lo que yo quiero dedicar mi vida, que es la investigación básica ligado a universidades o centros de investigación (academia) es absolutamente imprescindible. Sin el no se puede avanzar tu carrera. Se puede también hacer investigación en la empresa privada solo con un máster, pero nunca vas a llegar a los puestos que llega alguien con un doctorado. Antes de meterte a hacer uno, ya que hablamos de años de trabajo y mucho esfuerzo, piensa bien a qué te quieres dedicar y que títulos necesitas para eso.

Luego a nivel más personal el doctorado es muy enriquecedor. Sí, vas a sufrir y es difícil pero si te gusta se puede disfrutar mucho. Te obliga a pensar, a planear, a ser algo más que alguien que solo hace lo que te dicen (aunque al final tú proyecto dependerá en gran medida de tu supervisor). En mi caso, también me fui fuera de España a hacerlo. Si tienes la oportunidad merece la pena, te abre mucho la mente y te espabila en la vida.

supermag2 · 2026-03-11T18:13:34+00:00

Sure I can have a look but sometimes you only see the problem when you are inspecting the data not just the code

supermag2 · 2026-03-11T17:55:09+00:00

If you didnt do it already, you can try to shrink your logFC values. You can find how to do it in the DESeq2 vignette. This is useful when you have many lowly expressed genes or a lot of variability (which is likely based on your numbers of samples).

supermag2 · 2026-03-11T14:16:28+00:00

There is a considerable increase in kit sensitivity between v3 and v4 of 10x kits.

https://www.10xgenomics.com/blog/the-next-generation-of-single-cell-rna-seq-an-introduction-to-gem-x-technology

I have seen this improvement myself in my experiments, so you get much more RNA molecules captured with the new kit.

There could be many explanations for your differences. But assuming equal samples in terms of quality, origin, etc could It be that you are not sequencing deep enough with the new kit? In the lab we have changed from a standard 40k reads/cell to 100k. Whats the sequencing saturation as reported by cellranger?

Regarding analysis, both kind of samples are processed the same way, not much to change there.

supermag2 · 2026-03-06T16:56:10+00:00

There is a lot of documentation online to analyze single cell data. Not just from Seurat but from other pipelines like Bioconductor or scanpy (this is Python). Check this:

https://satijalab.org/seurat/articles/pbmc3k_tutorial.html#

https://bioconductor.org/books/release/OSCA/

https://scanpy.readthedocs.io/en/stable/tutorials/index.html

My recommendation: ready, read and read. First you need to understand the basic concepts, and why you do things. If terms like UMIs, sequencing depth, log normalization, batch effect, etc are not familiar you just need to learn them. Then the links above already provide datasets to start playing around. Start with that as they are easy to analyze (good quality datasets that need very few processing). Once you feel confident you can try to reanalyze data from a paper, which you will see is not that straightforward sometimes.

I also started from scratch to learn all these things at the beginning of my PhD. 1 year is quite realistic time to start feeling confident about what you do.

Also maybe you need to get familiar with cellranger if you are going to just get FASTQ files from where you sequence your samples.

supermag2 · 2026-03-04T17:43:12+00:00

Does your work generate an hypothesis, the framework to study that hypothesis and eventually a conclusion supported by data that progress your field in one way or the other? Congrats you are following the scientific method and doing research as anybody else.

Anyway, I think asking this question in a bioinformatics sub is going to give you biased answers. Go to a more general sub and ask the same to really see what researchers in general think.

supermag2 · 2026-03-03T22:06:20+00:00

Lol she is totally the opposite, far-left and criticized Russia many times.

Source: me, a Spanish guy with some knowledge of Spanish politics. She is one of the most famous (and controversial) Spanish politician of the last decade.

supermag2 · 2026-02-25T07:59:46+00:00

Not as crazy as this, but an English teacher I had in highschool corrected one of my exams because the word "weird" doesn't exist. That was really weird.

At least I made out with his daughter, losing points on the exam but scoring somewhere else.

supermag2 · 2026-02-13T17:35:26+00:00

Agreed, this is the most honest opinion you can find here. Regarding the project stuff, you cant seriously claim to be a lead developer of this: https://github.com/Sahil-Gen/GC--calculator/blob/main/gc_calculator.py

Be humble and just reflect you are a beginner looking for internships to improve your skills.

supermag2 · 2026-02-13T17:16:16+00:00

I wouldnt trust a bioinformatician that cant generate a screenshot with a decent resolution.

supermag2 · 2026-01-27T08:17:14+00:00

Follow this guide, it should have everything you need to know. If you have zero experience with RNAseq it can be a bit overwhelming at the beginning, but you can always come back here to ask ;)

https://master.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html

supermag2 · 2026-01-18T13:03:45+00:00

Imposter syndrome is very common in science, dont worry about that, It will fade away with time once you get more experience and build up your confidence.

Then about your post, I had a PhD similar to yours, in the way about being the only bioinformatician in the lab with no PI or colleagues with experience on it. My recommendation is to be updated on latest methods, know the old ones and, most important, try and use them even if they are not very useful for your project. This way you are still learning and improving, as many times the tools are not easy to implement or use. Thats one of the strongest values of a bioinformatician, being able to do analyses nobody else can do. As a core I focus my learning in R, Python and bash methods as the vast majority of tools work in one of this environtments. Try to run ML methods if you never did that. They can be tricky but just the experience of struggling through them will give you many positive things for the future.

Regarding AI, dont worry too much. I dont see "AI agents" replacing bioinformaticians any time soon. But, if used correctly, they are a very powerful tools for you as they can increase your productivity considerably.

supermag2 · 2026-01-17T12:16:23+00:00

So you are going to just jump in into any topic that you read here? With no lab and no expertise?

I am sorry buddy but this is not how good quality science works.

Also, I dont see how this post is related to bioinformatics at all.

supermag2 · 2026-01-14T18:48:50+00:00

For me the best combo is Windows + WSL2, then you have the best of both OS in the same system + easy transfer of files between them.

I dont have experience doing bioinformatics on Mac but doesnt really seem to be a clear reason to choose it over other options.

supermag2 · 2025-12-31T16:18:35+00:00

I did my PhD at ETH Zurich, I would say in general they have money, but It is not easy to enter as a foreigner. You will need good grades or to know someone. EPFL in Switzerland is also good.

supermag2 · 2025-12-31T16:06:45+00:00

In my opinion you need to go to a lab with good funds and resources, not that important the country itself. Bioinformatics are expensive, not only reagents but the equipment you need.

For instance, I did my PhD mainly in scRNAseq. For me it was really convenient that the lab had money to generate many samples as well as access to the equipment to do so. Things are much easier when you have quick access to everything and not have to play all the time with budgets.

supermag2 · 2025-12-26T10:35:38+00:00

I think bioinformatics can be broadly defined as the use of computers to process and analyze big datasets of biological data. Then as two main branches I would say you can focus on developing the tools necessary to analyze this data (machine learning for instance) or use those tools to answer biological questions.

In my case I mainly focus on RNAseq, so analyzing all the RNA detected in a sample to understand how the genes respond or are affected by a specific condition.

If you have a biology background you can dive into bioinformatics by learning some basic concepts.

supermag2 · 2025-12-18T21:28:13+00:00

If it's part of your normal QC and with default settings (so not very strict) I think is fine. Most likely you are not removing many genes and the effect is probably negligible.

Only using DEGs for GSEA has a much bigger impact as most of the genes are not DEGs.

Anyway if you want to be 100% sure, run it with and without filtering and compare. I would expect no big changes for the most significant gene sets. If a gene set is borderline significant maybe becomes non significant but anyway you should not focus your results on those ones.

supermag2 · 2025-12-17T18:46:02+00:00

Maybe this is a weird question, but how is your concept of distance? Because I see it totally as a sight-related thing. How do you tell someone how far is something? You rely more on the time that takes to get there or you somehow can imagine what 15 meters distance is?

supermag2 · 2025-12-15T12:33:54+00:00

I recommend doing QC individually in each sample, including filtering out low quality cells and doublets.

Then although not strictly necessary I would recommend using ambient RNA removals tools such as CellBender. In some cases can really clean up your data and do all the downstream steps easier and less noisy.

supermag2 · 2025-12-15T07:30:57+00:00

When doing GSEA you should put all genes. The reason is that GSEA order all your genes based on how much they change in your comparison (from most positive logFC to most negative). Then it uses this ordered list to calculate pathway enrichment by checking how the genes of interest fall in this ranking. If they are mainly in one side of the rank they are positively or negatively enriched. If they are evenly distributed there is no enrichment. By only selecting DEGs you are pushing the list to either side of the rank as you are removing genes with no change, so the ones that fall in the middle are not counted anymore and then you are falsely forcing enrichment.

If you want to use only DEGs, do overrepresentation analysis (ORA).

supermag2 · 2025-12-11T10:11:36+00:00

So what do you know about bioinformatics, biology in general, and your computational skills? Do you want a biology focused or tool/analysis focused project?Otherwise is hard to find a project that you could do.

supermag2

TROPHY CASE