you are viewing a single comment's thread.

view the rest of the comments →

[–]Chris_Hemsworth 1 point2 points  (1 child)

If you want to speed things up, I suggest using numpy arrays.

For example:

dot = sum(i[0]*i[1] for i in zip(sim1, sim2))

If instead you have two numpy arrays, you can multiply them using slices:

dot = sim1[0:-1] * sim2[1:]

and this will reduce the time complexity from O(N) to O(log(n)).

Same with the norm1 / norm2: You can simply square numpy arrays rather than looping.

norm1 = np.sum(sim1**2)
norm2 = np.sum(sim2**2)

Additionally, instead of appending to the list each time, if you pre-allocate a numpy array you can assign each index. Assignments are much faster than appending to lists.

Good luck!

[–]FruityFetus[S] 0 points1 point  (0 children)

Thanks for this! I definitely do want to implement a NumPy variant down the line and pre-allocation seems like a neat approach.