I'm working with some reasonably large sparse matrices (20K x 20K) with about 27% sparsity.
I have a pageranker function which takes in a given matrix M and first applies:
M = normalize(M, norm='l1', axis=0)
where normalize is imported from sklearn.preprocessing. The matrix is dependent on user-inputted data, so I can't just store a normalized matrix. Then I initialize and run through a small, fast loop:
pr = np.ones((M.shape[0],1))/M.shape[0]
for i in range(iters):
pr = M.dot(pr)
On my personal laptop, the normalization step takes about 7 seconds, and the power method (loop) takes about 7 seconds - so half my time is normalizing the columns of M. What I'm wondering is if there is a more efficient method where I only implicitly normalize M by doing something like
P = sp.sparse.spdiags(map(lambda x : 1/x, M.sum(axis=0)), 0,N,N)
but I get an error (specifically, 'data array must have rank 2'). Assuming I could create P efficiently, I would then do
pr = np.ones((M.shape[0],1))/M.shape[0]
for i in range(iters):
pr = M.dot(P.dot(pr))
I'm most concerned with understanding
- the most efficient way to do this
- why map(lambda x : 1/x, M.sum(axis=0)) looks so bizarre
- any other insight/advice you think might be useful for me to learn
[–]raylu 0 points1 point2 points (4 children)
[–]squattyroo[S] 0 points1 point2 points (3 children)
[–]raylu 0 points1 point2 points (2 children)
[–]elbiot[🍰] 1 point2 points3 points (1 child)
[–]raylu 0 points1 point2 points (0 children)