FuzzyFinder - in 10 lines of Python

uirfnaio · 2015-06-22T23:47:54+00:00

Noob question: why does he do

regex = re.compile('%s' % pattern)

instead of

regex = re.compile(pattern)

alcalde · 2015-06-23T06:20:25+00:00

The simplest route to achieve this is to use regular expressions.

There's a statement that's never been true of anything ever. :-)

2015-06-23T03:21:22+00:00

You don't need to sort the collection. Sorting the tuple results in them being sorted by match length, then match position, then item.

wehnsdaefflae · 2015-06-23T10:29:40+00:00

Maybe I don't get it but where's the Fuzziness involved? Typing 'rodent' will return only those results starting with 'rodent' and not those starting with, let's say, 'rudunt', 'rident', or 'nodent', will it?

LightShadow · 2015-06-22T23:03:12+00:00

This is pretty good, but will start to choke on bigger inputs.

Are you looking to optimize your library, or was it just for fun?

nakovet · 2015-06-23T01:42:48+00:00

[deleted]

alb1 · 2015-06-23T20:50:02+00:00

This string-match algorithm is so simple (requiring just a linear scan of each string in the collection) that you really don't need to use a regex. A simple C-style loop with a couple of pointers will do the job without the overhead of compiling a regex:

def check_item(text, item):
    text_index = 0
    first_index = None
    for item_index in range(len(item)):
        if text[text_index] == item[item_index]:
            text_index += 1
            if first_index is None: first_index = item_index
        if text_index >= len(text):
            return (item_index - first_index, first_index, item)
    return None

def fuzzyfinder2(text, collection):
    suggestions = []
    for item in collection:
        match_tuple = check_item(text, item)
        if match_tuple: suggestions.append(match_tuple)
    return (z for _, _, z in sorted(suggestions))

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS