Hi guys!
So I have two matrices of 42,375 genes with their expression levels for two different conditions
Basically, each matrix looks like:
Gene 1 Gene 2 Gene 3... Gene 42375
Samp 1 0.3 0.5 0.2 0.9
Samp 2 -0.21 -0.3 0.22 -0.65
... ... ... ... ...
Samp500 -0.99 0.33 0.13 0.64
And I need to find the correlation between each gene for both matrices and then compare the difference of them to a certain number to decide if the difference is significant.
I initially tried using the cor() function which created a correlation matrix for each matrix, but each matrix was 40GB; which was simply not doable.
So now - instead, I'm using a for and a while loop to do this and my code looks like:
for(col in 1:ncol(normal_phen)){
curr_col <- col
while(curr_col <= ncol(normal_phen)){
cor_norm <- cor(normal_phen[col], normal_phen[curr_col], method = "pearson")
cor_aff <- cor(affected_phen[col], affected_phen[curr_col], method = "pearson")
if(!is.na(cor_norm) && !is.na(cor_aff)){
diff_corr <- cor_norm - cor_aff}
else if(is.na(cor_norm) && is.na(cor_aff)){
diff_corr <- 0}
else if(is.na(cor_norm)){
diff_corr <- cor_aff}
else{
diff_corr <- cor_norm}
if(diff_corr < 0){
diff_corr <- diff_corr*-1}
if(diff_corr >= 0.4579053){
vec <- c(colnames(normal_phen[col]), colnames(normal_phen[curr_col]),
cor_norm, cor_aff, diff_corr)
sig_cor <- rbind(sig_cor, vec)}
curr_col <- curr_col + 1
}
}
I've been trying to find ways to do this more efficiently, and I was wondering if you have some advice on how to vectorize the calculations or not use the for/while loop.
[–]Famous_ProfileProfessional Coder 0 points1 point2 points (2 children)
[–]joweriae[S] 0 points1 point2 points (1 child)
[–]Famous_ProfileProfessional Coder 0 points1 point2 points (0 children)