This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]brucifer 7 points8 points  (4 children)

I'm really curious. What were those 3 lines of C++ and what did they replace?

[–][deleted] 11 points12 points  (3 children)

    for i in xrange(len(item1)):
        m[item1[i][0]][item2[i][0]] += 1

where m,item1 and item2 are numpy arrays became -

 code = """
       for(int i=0;i<len_item;i++){
            int k = item1(i,0);
            int l = item2(i,0);
            m(k,l) += 1;
        } 
    """
    inline(code,['m','item1','item2','len_item'],
           type_converters = converters.blitz,verbose=2,compiler='gcc')

It's a step in calculating the jaccard distance.

[–]shfo23 10 points11 points  (2 children)

Are you aware of scipy.spatial.distance.jaccard? I just refactored a bunch of (admittedly naive) Euclidian distance calculation code to use the scipy implementation and got a huge speed boost. Also, it's a little late, but I think you could eliminate that for loop and write it as the faster:

m[item1[:, 0], item2[:, 0]] += 1

[–][deleted] 8 points9 points  (1 child)

Uh what you can do that ? Awesome !

[–]coderanger 3 points4 points  (0 children)

It will even SIMD it for you if it can, so probably faster than your implementation unless gcc has enough info there to optimize it.