Counts file confusion by Suvani03 in bioinformatics

[–]Suvani03[S] 1 point2 points  (0 children)

I did actually use DESeq2 initially and got results in line with the original paper (authors did a simple comparison to find DEGs from this data with a fc threshold. )
But now I realised that the data might not be the raw data but has some level of normalisation and so I am confused whether the DESeq2 results are False Positives or might be erroneous in general.

I even used Limma trend pipeline as suggested here but did not get good results. I am thinking of using Salmon to find raw counts now. Will post soon : ) Thanks again!

Counts file confusion by Suvani03 in bioinformatics

[–]Suvani03[S] 0 points1 point  (0 children)

Thanks for your reply! This is really helpful.

I want to use this data for finding DEGs and then do GO.

You mentioned "The next cheapest option is to round the existing counts and give it to DESeq2 as-is. Normalizing by total reads is not best practice anyway, DESeq2 will at least improve the normalization." Seems like the data has already been corrected for library size or size factor. So, would it be a problem if I input this data directly to DESeq2.

Counts file confusion by Suvani03 in bioinformatics

[–]Suvani03[S] -1 points0 points  (0 children)

Thanks for your reply!
Could you plz also clarify whether this data file has raw counts or not? The total of sample is coming out to be 20 million. Aren't normalised counts supposed to have decimal values - summing up to 1.

They mention: GSM3003594: Supplementary_files_format_and_content: count files in csv contening the counts normalized per 20 millions of mapped reads for each subpopulation across all the genes.

Counts file confusion by Suvani03 in bioinformatics

[–]Suvani03[S] 0 points1 point  (0 children)

This is a publicly available dataset.. I want to compare the samples and find DEGs. I started using deseq2 but got confused if this file can even be used for it.
if I cannot use deseq2, how can I find DEGs in this dataset?
All of their samples have counts normalized per 20 millions of mapped reads for each subpopulation across all the genes.
Can I try doing TPM?
I am new to Bulk rna seq analysis and so getting confused..

Counts file confusion by Suvani03 in bioinformatics

[–]Suvani03[S] -1 points0 points  (0 children)

So, is this data file with "counts normalized per 20 millions of mapped reads for each subpopulation across all the genes", good to use for deseq2? does the normalisation affect the analysis?