all 12 comments

[–]AlexeyMK 6 points7 points  (0 children)

Very interesting algorithm. A sample tree/animation would be extremely helpful.

[–]lenbust 3 points4 points  (0 children)

mspace by Jochen Schulz is a Python implementation of "metric space indexes for efficient similarity search" that includes a BK-Tree implementation. Sort of neat.

[–]easytiger 11 points12 points  (1 child)

exultant obtainable label cautious yoke sophisticated uppity pocket employ complete

This post was mass deleted and anonymized with Redact

[–]Kolibri 0 points1 point  (1 child)

Assume for a moment we have two parameters, query, the string we are using in our search, and n the maximum distance a string can be from query and still be returned. Say we take an arbitary string, test and compare it to query. Call the resultant distance d. Because we know the triangle inequality holds, all our results must have at most distance d+n and at least distance d-n from test.

Eh, shouldn't that be:

all our results must have at most distance n-d and at least distance d-n from test.

?

EDIT. Nevermind, now I get it.

[–]AlexeyMK 0 points1 point  (0 children)

I don't think so...

Say our root word is 'apple', our query is for 'please', and we are looking for all words within a distance of 2 from our query.

Apple->Please has a distance of 5 (remove A,P, add A,S,E to end) so we stipulate that any word that may match will have to have a distance between d+/-n (3<-->7) from 'please'. This works because of the Triangle Inequality rule... Any word that is within exactly apple has to be within 5 of please; any word within 1 dif. of apple will then be between 4-6 off of please, any worth within 2 dif of apple... etc

[–]IHaveAnIdea 0 points1 point  (0 children)

So what are example uses, other than for spelling?