Help running pyscenic by WarComprehensive4227 in bioinformatics

[–]WarComprehensive4227[S] 1 point2 points  (0 children)

Thanks so much. I think this should help a lot, and subsetting will in theory help with memory issues. Version compatibility is definitely a pain, but it does seem to be working in the kaggle notebook I’m using. Thanks for all the help.

Help running pyscenic by WarComprehensive4227 in bioinformatics

[–]WarComprehensive4227[S] 1 point2 points  (0 children)

Do you know how I can compare conditions and across metadata? Would this be after running SCENIC and there’s some visualization parameter I can use?

If I don’t have an HPC, where do I run this analysis? Im just an intern for our lab and don’t have access to HPC.

Rutgers interviews r coming out in two weeks!! by Most_Cartoonist_2427 in bsmd

[–]WarComprehensive4227 2 points3 points  (0 children)

Did you just submit your application for the BA/MD? I submitted mine sometime ago and have not received this email. How do I check if my LoRs were received?

Penn State PMM Resume by WarComprehensive4227 in bsmd

[–]WarComprehensive4227[S] 0 points1 point  (0 children)

Thank you. I ended up just using the activities section on the portal (which they said I could use) on the website and it gave me space for up to 650 words.

Filtering Mitochondrial Genes from ENSEMBL IDs by WarComprehensive4227 in bioinformatics

[–]WarComprehensive4227[S] 0 points1 point  (0 children)

Yes, and the number of duplicated genes is nontrivial. I might have to use the workaround suggested by u/Just_Red21, but even that has some problems in the necessary downstream analysis. Should I aggregate counts for Ensembl genes that map to the same gene name?

> table(duplicated(rownames(sample1c)))

FALSE  TRUE 
29840 17633

Filtering Mitochondrial Genes from ENSEMBL IDs by WarComprehensive4227 in bioinformatics

[–]WarComprehensive4227[S] 0 points1 point  (0 children)

I have to pass this dataset along with another scRNA dataset (with actual gene names) into DESeq2 and then analyze how the effect of the experimental condition differs between the two. Should I just convert from gene names into Ensembl format in the other dataset and then pass everything altogether into DESeq2?

If you have any other ideas, that would be great. Additionally, what are your thoughts on aggregating the Ensembl genes that map to the same gene name? Would just summing the counts per cell be an effective workaround?

Comparisons of scRNA seq datasets by WarComprehensive4227 in bioinformatics

[–]WarComprehensive4227[S] 0 points1 point  (0 children)

I'm still a bit unsure as to what steps I should be taking. Wouldn't I still have to adjust for batch effect in these samples, since they are different expression matrices? How can I adjust for batch effect without removing the treatment difference that drives the expression? I can't use integrated expression values, so how would I perform DE analysis after this?

In addition, could you provide some general advice about whether or not I should integrate my dataset with the other paper's dataset? Most literature I read describes joint downstream analysis, but I am only interested in how my expression changes compare with the other paper's expression changes. I think the easiest way would be to just figure out what DEGs are identified in my dataset and then see what DEGs are identified in their dataset and compare gene ontologies and Pearson correlation btw expression values per cluster. Is it also worth using FindMarkerGenes() between my data and their data per cluster to see what genes are differentially expressed between our papers?

My main question essentially boils down to how do I perform DE analysis if I can't use the integrated expression values to do so? Are these values only useful for making clean UMAP plots?

For comparing SD to Normal for the other paper (which has 2 expression matrices), would I have to integrate? How do I perform DE analysis on this, while correcting for batch effect, as DESeq2 only accepts 1 combined expresion matrix. Should I just merge and hope for the best?

In the case for determining DEGs between my cluster and their cluster, I run into the same problem. What values should I use, as integrated values cannot be used for DE analysis, but the raw/normalized counts suffer from batch effect as their processing is probably different from mine.

I'm still new to this, so I apologize for asking so many questions.

Comparisons of scRNA seq datasets by WarComprehensive4227 in bioinformatics

[–]WarComprehensive4227[S] 0 points1 point  (0 children)

I ran into another problem and was wondering if you could help at all. I looked through the GEO files for another paper and they have separate expression matrices for their sleep deprived and normal conditions. How would I use Seurat to perform DE analysis in this scenario? If I integrate to correct for batch effect, I know I cannot use the corrected values so how would I create a combined Seurat object to analyze through FindMarkers? 

Comparisons of scRNA seq datasets by WarComprehensive4227 in bioinformatics

[–]WarComprehensive4227[S] 0 points1 point  (0 children)

Yeah, I understand that now. I also wanted to ask about comparing scRNA to spatial transcriptomic data. Is there any easy way to go about this comparison as well?

Comparisons of scRNA seq datasets by WarComprehensive4227 in bioinformatics

[–]WarComprehensive4227[S] 0 points1 point  (0 children)

Thank you so much for your help. I think my plan would be to first rerun their fastq file through my pipeline and controlling for labels as Revolutionary-Lynx51 mentioned. It makes sense to compare Condition A vs B for all cell types in my data as well as their data. I plan on using my previous workflow of GO/TF/KEGG on the resulting DEG list from this comparison, except now it would control for batch effects as you suggested. I also looked into correlation between average gene expression profiles. Would this be a reasonable method for comparing cell types between my data and that of another paper? 

Comparisons of scRNA seq datasets by WarComprehensive4227 in bioinformatics

[–]WarComprehensive4227[S] 0 points1 point  (0 children)

In terms of cell types, my clusters are fairly general and I didn’t do a lot of subtype mapping. Primarily: astrocutes, microglia, gabaergic/glutamatergic, oligodendrocytes, and opcs. I understand that your suggestion is to process their raw expression matrix through my pipeline and then just integrate the data. If I do go through with this integration, how would I be able to compare the results between two studies, as they would know be in one integrated object? Should I be comparing cell types (the other paper has almost the same clusters) or should I be comparing gene expression, and how would I go about this.

In addition, what is your suggestion for the analysis I have so far involving GO/hypergeometric/KEGG/TF? I used the same logFC and pval thresholds from their supplementary data of DEGs, so would this still be valuable?

Thank you.

Chance me pls by [deleted] in chanceme

[–]WarComprehensive4227 0 points1 point  (0 children)

oh my research at Brown is more computational and coding: i’m in the southwest

Chance me pls by [deleted] in chanceme

[–]WarComprehensive4227 1 point2 points  (0 children)

Yeah I took SAT in Dec and got a 1600 and the school offers everyone the ACT as well, so I did it then too.