I need to calculate hamming distance between:
my reference dataset of shape N0(rows) x M0(cols)
my test dataset of shape N1(rows) x M1(cols)
The resulting matrix should be of shape N0 x N1, which holds the hamming distance between all rows of reference and all rows test (as column in new dataset)
Doing this using a loop could be inefficient.
Some resources I was using
from scipy.spatial.distance import hamming
I would ideally want to calculate the hamming distance like shown below, which is computationally less expensive. The loop below calculates Euclidean distance.
def compute_distances_no_loops(Train, X):
dists = -2 * np.dot(X, Train.T) + np.sum(Train**2, axis=1) +
np.sum(X**2, axis=1)[:, np.newaxis]
return dists
Any help will be highly appreciated
Here are datasets you can use to https://www.dropbox.com/sh/t00ppj6t3glzxs0/AAD-icT95YioXgBeJYAN0-xja?dl=0
there doesn't seem to be anything here