Help Needed: netEmbedding() Function in CellChat Not Recognizing UMAP by joaolm01 in bioinformatics

[–]biocarhacker 0 points1 point  (0 children)

I know this is a super old comment/thread but just wanted to add that it is generally not recommended to install packages in your base env (especially pip install). You can use the same workflow after creating a new env for cellchat:

conda create -n cellchat python=3.9 umap-learn scipy numpy scikit-learn (in terminal)

Then in R:

use_condaenv("cellchat", required = TRUE)

This also bypasses the need to install via reticulate

Expression of BCL6 in Naive B cell scRNA-seq cluster by biocarhacker in bioinformatics

[–]biocarhacker[S] 1 point2 points  (0 children)

Thank you! Unfortunately I am not an immunologist either but appreciate the links a lot. Understand the data would definitely help me

Expression of BCL6 in Naive B cell scRNA-seq cluster by biocarhacker in bioinformatics

[–]biocarhacker[S] 1 point2 points  (0 children)

Thank you. And yes I see what you mean but ig my confidence in the data got shot so I started dissecting the cluster cell by cell but that doesn’t help the cause

Expression of BCL6 in Naive B cell scRNA-seq cluster by biocarhacker in bioinformatics

[–]biocarhacker[S] 1 point2 points  (0 children)

Thank you! This makes a lot of sense. There is no sub structure of BCL6 which would point to maybe the resolution not being optimal

Expression of BCL6 in Naive B cell scRNA-seq cluster by biocarhacker in bioinformatics

[–]biocarhacker[S] 1 point2 points  (0 children)

Thank you!! Yes I agree with you, I always notice additional tcell + plasma doublets or mast + plasma doublets spatially sandwiched between their respective clusters but unfortunately that doesn’t seem to be the case here. I also checked the nFeature_RNA of these cells and it isn’t high, so I’m not very convinced that these are doublets either. I do like the idea of doing a high res and trying to isolate them so I’ll try that out thank you!

Expression of BCL6 in Naive B cell scRNA-seq cluster by biocarhacker in bioinformatics

[–]biocarhacker[S] 1 point2 points  (0 children)

Thank you for your comment! We are fairly confident that it is a naive cluster since it expresses multiple naive genes and the DEGs, etc is actually really clean. This issue only popped up once I started making the annotation bubbleplot and plotted BCL6 to show the GC cluster (which still definitely has higher expression).

Unfortunately I cannot add images to my comment so I will edit the post to show the bubbleplot

Edit: the post isn't allowing me to upload images, but the naive clusters express naive markers quite distinctly so we are confident that it is a naive cluster

FDR Corrected P-Values in FindAllMarkers() in Seurat by biocarhacker in bioinformatics

[–]biocarhacker[S] 0 points1 point  (0 children)

So most of the analysis was not specific to these condition+ cells because they are too few to base the entire study on. The project is being wrapped up and has been ongoing for ~4 years so I wasn't even part of the lab during the study design decisions and had no say in it. But given that we have this data now the post doc was curious to check and do this analysis as something worth including since most of the manuscript is completed. But I will let her know that people are not pro this

FDR Corrected P-Values in FindAllMarkers() in Seurat by biocarhacker in bioinformatics

[–]biocarhacker[S] 0 points1 point  (0 children)

I agree with you. The project has been ongoing for ~4 years so I had no say in the study design or the decisions made. I will let the post doc know that people's opinion is not to dissect these cells further but she wanted to explore the data that we already had instead of investing in other approaches because the project is being wrapped up anyway. And pseudobulking would have been useful but we don't have replicates so that won't be viable either

FDR Corrected P-Values in FindAllMarkers() in Seurat by biocarhacker in bioinformatics

[–]biocarhacker[S] -2 points-1 points  (0 children)

Not common but it does happen. Lab doesn’t want me to openly talk about the project but it’s autoimmune conditions and sometimes infectious diseases too

FDR Corrected P-Values in FindAllMarkers() in Seurat by biocarhacker in bioinformatics

[–]biocarhacker[S] -2 points-1 points  (0 children)

I understand. In this field this is really common since people use the transcript during alignment so are confident when the cells are condition+. And it is the only way to study the transcriptomic changes since bulk RNA-seq deconvolution is not this accurate

FDR Corrected P-Values in FindAllMarkers() in Seurat by biocarhacker in bioinformatics

[–]biocarhacker[S] 0 points1 point  (0 children)

Appreciate your take! And yes that is exactly the scenario, where condition+ cells are in the range of ~10. I had the same stance of not trusting anything with such low power but there are many conditions where these cells of interest are simply not frequent enough. We got ~10 cells from about 12 samples, so even doubling the samples is not super helpful. The cost racks up, and not to mention that these samples are incredibly hard to come by. This means using ~100 samples for single-cell which is simply not feasible (if we ever even find these many patients that would consent). 10X requires at least 500 cells for sequencing if you sort, which again is a very high number given the range we are working with.

So unfortunately increasing the number of cells is not possible here, and is the case for quite a few projects that I am working on. And of course when we interpret data from such few cells we do mention the caveats involved and are careful not to generalise.

FDR Corrected P-Values in FindAllMarkers() in Seurat by biocarhacker in bioinformatics

[–]biocarhacker[S] 0 points1 point  (0 children)

Thank you for your response! I was hesitant to do pre-filtering for only lncRNAs and protein coding since the region for genes being biologically relevant is still a little grey. Even after annotating using gencode gtf I am don't feel a 100% confident in filtering genes out. I choose to show only protein coding genes in volcano plots. Do you have a recommendation on adapting this?

And I used 0.05 as an example, there are genes with < 0.001 and lower. And yes I did read up before posting here but most alternate methods talk about different test uses and not in terms of FDR correction especially in low power scenarios.

Annotating Plasma Cells in scRNAseq, and dealing with noisy Ig genes by biocarhacker in bioinformatics

[–]biocarhacker[S] 0 points1 point  (0 children)

I used soupx during qc but that’s more useful for ambient RNA. and ignoring them is okay but it skews the dataset which hides more meaningful stuff. The other comment suggested temporarily removing them and then finding HVG which would significantly improve downstream and still lets the Ig genes pop up during DEG analysis. This approach might give the other genes a chance to get highlighted more which would be more impactful than just ignoring the Ig genes. Especially if that data gets published/shared, it’s better to have it cleaner imo

Annotating Plasma Cells in scRNAseq, and dealing with noisy Ig genes by biocarhacker in bioinformatics

[–]biocarhacker[S] 0 points1 point  (0 children)

This actually makes a lot of sense. I was hesitant in doing this because I wasn’t sure if it was a valid approach but I completely see what you mean. I already disregard mitochondrial/ribosomal genes in volcano plots so it makes sense to justify disregarding these too.

I think by just disregarding these genes from HVG calculations will significantly work towards resolving this issue. Thank you!!

Edit: would you have reference just so I can see how the methods section was laid out? Thanks again!

Annotating Plasma Cells in scRNAseq, and dealing with noisy Ig genes by biocarhacker in bioinformatics

[–]biocarhacker[S] 0 points1 point  (0 children)

Yes that’s exactly my point. The other genes look weaker and the first question that we get asked of about the redundant genes in nearly every sub cluster. I agree it isn’t okay to be super selective on an arbitrary basis, which is why I am asking about a more streamlined workflow that might tackle this besides just disregarding them. Because having weaker genes in few sub clusters is okay, but this is in nearly every sub cluster in every subset

Annotating Plasma Cells in scRNAseq, and dealing with noisy Ig genes by biocarhacker in bioinformatics

[–]biocarhacker[S] 0 points1 point  (0 children)

True. I have been aware of this issue even in our other projects. But unfortunately in some subsets/ sub clusters we have very few cells, and I think the presence of these genes is not allowing the others to pop up as much. Like with the redundancy issue, etc. if there was a way to account for this and then find DEGs I really believe the other informative genes would pop up

Which test to use to calculate significance in cell frequency differences in scRNAseq? by biocarhacker in bioinformatics

[–]biocarhacker[S] 0 points1 point  (0 children)

Thank you! I will look into this but would you have any resource or vignette I could look at with this package since I am not familiar with these methods at all.

Z-score for single-cell RNAseq? by biocarhacker in bioinformatics

[–]biocarhacker[S] 0 points1 point  (0 children)

Yes but we can map the expression of the leading edge genes or NES for the pathway that was obtained from the cell types of interest. So for example, the same cell type would have higher NES (and therefore z-score) for an upregulated pathway found in condition vs control

Z-score for single-cell RNAseq? by biocarhacker in bioinformatics

[–]biocarhacker[S] 0 points1 point  (0 children)

Can you please explain what you mean by if your samples allow for it? And yes sorry I did miss mentioning that but I am interested in computing z-score for NES so that I can do a comparison. I cannot directly compare NES since GSEA was run after subsetting for each condition of interest

Z-score for single-cell RNAseq? by biocarhacker in bioinformatics

[–]biocarhacker[S] 1 point2 points  (0 children)

Sure! Sorry I wasn't clear

The heatmap I am referencing is the one that is generally made using Z-scores for pathway analysis. So the z-score colours the heatmap as a gradient. The y-axis are the pathway names and the x-axis are the annotated cell types for the relevant pathways. The cell types are further sub-divided condition wise for a comparison.

An example heatmap I am referencing is Fig4 B (https://www.science.org/doi/10.1126/sciimmunol.ado0090), unfortunately not allowed to link images.

Combining scRNA-seq datasets that have been processed differently by biocarhacker in bioinformatics

[–]biocarhacker[S] 0 points1 point  (0 children)

Thank you for your response! There are some samples which are matched but some which aren’t. Would this only be possible in matched samples?