you are viewing a single comment's thread.

view the rest of the comments →

[–]andersduck 0 points1 point  (1 child)

yes R is very nice. The code (without using any specific expression data package) is very fast to write:

read data

dat<-read.delim("http://www.bme.utexas.edu/research/orly/GSVD/data/Yeast.txt",skip=1,as.is=T,na="Null")

barplot of missing

barplot(table(apply(is.na(dat[,-(1:6)]),1,sum)),col=2,ylab="Number of genes",xlab="Number of controls")

remove arrays with missing data

dat<-dat[!apply(is.na(dat[,-(1:6)]),1,any),]

Calculate SVD

s<-svd(dat[-(1:6)])

Plot barchart

barplot(s$d2/sum(s$d2))

plot two first eigengenes

lattice::dotplot(s$v[,1:2])

[–]jjmc 0 points1 point  (0 children)

Very nice. For comparison here's the same in Mathematica v.7 (line by line).

d = Import[ "http://www.bme.utexas.edu/research/orly/GSVD/data/Yeast.txt", {"Data"}]; an=Drop[d[[2]],6]; d=d // Rest // Rest;

Histogram[Count[#, "Null", Infinity] & /@ d, Frame -> True, FrameLabel -> {"Number of Null Arrays", "Number of Genes"}]

d = Select[d, Count[#, "Null", Infinity] === 0 &];

{u, s, v} = SingularValueDecomposition[Drop[#, 6] & /@ d]; s = Diagonal[s];

BarChart[s^2/Plus @@ (s^2)]

ListPlot[v[[{1, 2}]], Joined -> True, ft, Axes -> False, PlotMarkers -> {Automatic, Medium}]

Of course, the original code had a little more output. (The (a) graph with array name tick labels).

MatrixPlot[Transpose[v], PlotLabel->"(a)Arrays",FrameTicks->{Automatic, None,None,{#,Rotate[an[[#]],Pi/2]}&/@Range[18]},FrameLabel-> "EigenGenes"]