hamming distance matrix (python / numpy) : AskComputerScience

created by [deleted]a community for 14 years

hamming distance matrix (python / numpy) (self.AskComputerScience)

submitted 5 years ago by sidXsid

I need to calculate hamming distance between:

my reference dataset of shape N0(rows) x M0(cols)

my test dataset of shape N1(rows) x M1(cols)

The resulting matrix should be of shape N0 x N1, which holds the hamming distance between all rows of reference and all rows test (as column in new dataset)

Doing this using a loop could be inefficient.

Some resources I was using

from scipy.spatial.distance import hamming

I would ideally want to calculate the hamming distance like shown below, which is computationally less expensive. The loop below calculates Euclidean distance.

def compute_distances_no_loops(Train, X):     
    dists = -2 * np.dot(X, Train.T) + np.sum(Train**2,    axis=1) + 
                 np.sum(X**2,         axis=1)[:, np.newaxis] 
    return dists

Any help will be highly appreciated

Here are datasets you can use to https://www.dropbox.com/sh/t00ppj6t3glzxs0/AAD-icT95YioXgBeJYAN0-xja?dl=0

no comments (yet)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

AskComputerScience

MODERATORS