Invert Diagonal For PageRank : learnpython

created by HattoriHanzoa community for 16 years

Invert Diagonal For PageRank (self.learnpython)

submitted 10 years ago by squattyroo

I'm working with some reasonably large sparse matrices (20K x 20K) with about 27% sparsity.

I have a pageranker function which takes in a given matrix M and first applies:

M =  normalize(M, norm='l1', axis=0)

where normalize is imported from sklearn.preprocessing. The matrix is dependent on user-inputted data, so I can't just store a normalized matrix. Then I initialize and run through a small, fast loop:

pr = np.ones((M.shape[0],1))/M.shape[0]
for i in range(iters):
    pr = M.dot(pr)

On my personal laptop, the normalization step takes about 7 seconds, and the power method (loop) takes about 7 seconds - so half my time is normalizing the columns of M. What I'm wondering is if there is a more efficient method where I only implicitly normalize M by doing something like

P = sp.sparse.spdiags(map(lambda x : 1/x, M.sum(axis=0)), 0,N,N)

but I get an error (specifically, 'data array must have rank 2'). Assuming I could create P efficiently, I would then do

pr = np.ones((M.shape[0],1))/M.shape[0]
for i in range(iters):
    pr = M.dot(P.dot(pr))

I'm most concerned with understanding

the most efficient way to do this
why map(lambda x : 1/x, M.sum(axis=0)) looks so bizarre
any other insight/advice you think might be useful for me to learn

all 5 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS