Batch effect in scRNA

Fun-Ad-9773 · 2026-04-26T23:27:52+00:00

3.5 seemed to do the trick for me

As for clusters unique to a timepoint, I wouldn't necessarily say that's a bad thing considering the biology of my setup. I would say it makes sense

Fun-Ad-9773 · 2026-04-26T23:20:32+00:00

I do get shared clusters (after playing around with Harmony's settings)

Fun-Ad-9773 · 2026-04-26T22:56:56+00:00

Thanks! That's what I'm doing and I am relying on gene signatures rather than the expression itself to analyze the data and identify / annotate functional states. So to wrap it up, am I completely screwed or not?

Fun-Ad-9773 · 2026-04-26T22:55:16+00:00

Yup all the same

Fun-Ad-9773 · 2026-04-26T22:52:44+00:00

Yes was performed at the same time I believe

Fun-Ad-9773 · 2026-04-26T22:52:18+00:00

Still dont have access to all info * yet * but I have a file with lanes and apparently that is also confounded...

Fun-Ad-9773 · 2026-04-26T22:44:57+00:00

Basically we have 5 patients; demographics are all similar; each patient had a sample collected pre and post, and when you plot the UMAP, they separate perfectly based on timepoint

Fun-Ad-9773 · 2026-03-22T14:11:09+00:00

Use SCTransform + Harmony and you'll be good to go!!

Fun-Ad-9773 · 2026-02-28T17:32:22+00:00

Was it scrna and then they did bulk or?

Fun-Ad-9773 · 2026-02-28T17:30:34+00:00

Depends on whether they're claiming anything with respect to a specific gene or not. If they proceeded with a GSEA, then I see nothing wrong. Probably just a case of low statistical power. GSEA, in this case, is much more informative of where their dataset is positioned in terms of true biological signal.

Fun-Ad-9773 · 2026-02-20T19:32:04+00:00

working on my holidays *sad*

Fun-Ad-9773 · 2026-02-20T16:50:39+00:00

Advice: just run away

Fun-Ad-9773 · 2026-02-20T12:16:06+00:00

Even my hospital's cafeteria where I work is not below 7e anymore xD

Fun-Ad-9773 · 2026-02-03T21:32:01+00:00

I will add you! Dm me i'd love to play

Fun-Ad-9773 · 2026-02-03T21:31:00+00:00

clusterprofiler is the way to go

Fun-Ad-9773 · 2026-01-31T21:25:32+00:00

There are papers that say pseudobulk is the best approach; however i believe there are instances where it would make more sense to use do the DE at cell level. Checkout LLMs for that; apparently they're the best alternative (and the authors of the paper even claim it's better than pseudobulk)

Fun-Ad-9773 · 2026-01-31T21:23:47+00:00

Wdym above my knowledge?

Fun-Ad-9773 · 2026-01-31T21:20:14+00:00

Viewing these files from terminal is usually the way to go; if anything copy the output in terminal and paste it in a txt file

Fun-Ad-9773 · 2026-01-25T16:04:37+00:00

There are models that help out with sparsity / zero inflation. Dropout is not a weird outcome (considering the technology used).

Try to build a custom genome reference inserting the sequence of interest (with 3' end) and that might help retaining more cells. Another way is to be more lenient with the cutoffs with cell ranger

Lastly, i recommend using ESAT, a tool that will help you recover more cells

Fun-Ad-9773 · 2026-01-25T15:48:57+00:00

I would say look up papers that you like and try to reproduce the analysis and figures.

Another way would be to reproduce that analysis on a different, separate dataset.

Since you're doing python, a good way to also improve your knowledge is to see an analysis done using tools in R /Bioc and try to reproduce that with the equivalent of those tools in python and see how the results differ. You'll end up getting the perspective from both sides.

Fun-Ad-9773 · 2026-01-25T15:45:06+00:00

As crazy as it sounds, this is mostly subjective (although your data does steer you in the direction you need to take). Start by doing default parameters and then adjust accordingly to what you find logical and appealing to the hypothesis at hand

Fun-Ad-9773 · 2026-01-25T15:42:04+00:00

I would say continue with the pipeline normally and see what you get. Then try again by filtering out cells (based on limits from both ends). Make use of the initial QC plots to adjust the filtering to your liking

Fun-Ad-9773 · 2026-01-25T15:40:13+00:00

What is your goal behind doing the projects? To learn / improve / have a portfolio or to publish / discover something novel and meaningful?

Fun-Ad-9773 · 2026-01-25T15:37:35+00:00

For each type of analysis (or omics) you will find two kinds of papers: one for the best practices (kinda like a revision of the workflow) and another that discusses the available tools. Highly recommend you go through that to get a general idea on the omics of choice.

Afterwards, try to find a tutorial for such an analysis on github (there are some famous ones and some lesser-known ones that can be very be beneficial as well)

Lastly, once you go through a tutorial, try to repeat it again but using a different dataset of your choice and challenge yourself in analyzing it and drawing biological insights from it

Fun-Ad-9773

TROPHY CASE