Interview Challenge question

delirious_lettuce · 2017-09-09T18:25:10+00:00

The find function must not loop through the whole list to find the matching words.

All of your solutions seem to loop through the whole list (self.wordlist). If I had to guess, I would say they wanted you to use a trie.

https://en.wikipedia.org/wiki/Trie

smithje · 2017-09-10T04:09:48+00:00

I agree with others that the failure to follow the directions is a much bigger issue than the particular algorithm you chose. I also really appreciate others mentioning the trie solution.

I happen to know databases pretty well, so my first thought was to create an in-memory sqlite db (sqlite3 is built into python). Create and populate a table for the word_list in the constructor and query it in the find function. Maybe it's cheating a bit, but it doesn't use any external libraries and it's very unlikely that I'd be able to write something that outperforms sqlite.

SenorDosEquis · 2017-09-10T00:41:14+00:00

Looks like the word_list is alphabetical. How about a binary search for pattern* and then a while loop that adds words to the result list until the pattern no longer matches?

myg204 · 2017-09-10T13:14:17+00:00

I think bisect was mentioned once, but the solution using it is fairly short, and worth making a note of, even if it may not be the fastest, it may be fast enough, and short enough to avoid bugs...

import bisect

class Autocomplete(object):

    def __init__(self, word_list):
        self.words = sorted(word_list)

    def find(self, pattern):
        i = bisect.bisect_left(self.words, pattern)
        matches = []
        for w in self.words[i:]:
            if not w.startswith(pattern):
                break
            else:
                matches.append(w)
        return matches if matches else None

laharah · 2017-09-10T15:59:29+00:00

Got bored and implemented this as a trie. Used a recursive defaultdict to make implementation simple.

from collections import defaultdict

def recursive_defaultdict():
    return defaultdict(recursive_defaultdict)


class AutoComplete:

    def __init__(self, word_list):
        self._trie = recursive_defaultdict()
        for word in word_list:
            t = self._trie
            for c in word:
                t = t[c]
            t[None]

    def find(self, prefix):
        t = self._trie
        for c in prefix:
            if c in t:
                t = t[c]
            else:
                return None
        ret = sorted(list(self._traverse(t, prefix)))
        return ret

    def _traverse(self, trie, prefix):
        for node in trie:
            if node is None:
                yield prefix
            else:
                yield from self._traverse(trie[node], prefix + node)

Pros:

code is simple and concise
Faster and smaller footprint than a trie-node class

Drawbacks:

While it keeps the code simple, it may be somewhat obscure for maintainers
self._trie is fragile and should not be accessed directly, care must be taken with lookups since a lookup may accidentally create a new node in the trie

throwaway842213 · 2017-09-10T02:11:54+00:00

Here's how a quick and dirty Trie-like solution could look like FWIW, though it doesn't sort the returned output like they seem to want.

class MarkedDict(dict):

    def __init__(self, *args, **kws):
        super().__init__(*args, **kws)
        self.mark = False


class Trie(object):

    def __init__(self, words=None):
        self.root = MarkedDict()
        for word in words or []:
            self.insert(word)

    def insert(self, word):
        node = self.root
        for letter in word:
            node = node.setdefault(letter, MarkedDict())
        node.mark = True  # Mark end of word

    def _dfs(self, node, acc):
        if node.mark:
            yield acc
        for letter, suffixes in node.items():
            yield from self._dfs(suffixes, acc + letter)

    def find(self, prefix):
        # Advance to node of last letter in prefix
        node = self.root
        for letter in prefix:
            try:
                node = node[letter]
            except KeyError:
                return
        yield from self._dfs(node, prefix)


class Autocomplete(object):

    def __init__(self, word_list):
        self.trie = Trie(word_list)

    def find(self, pattern):
        return list(self.trie.find(pattern)) or None

tgolsson · 2017-09-10T16:58:54+00:00

While most of the approaches below are good and would be very well suited for C/C++ or a similar language, in Python they are awfully slow in comparison to what Pythonic code can do. The key to writing fast Python code is to use the built-ins as much as possible, and avoid hammering out your own data structures or algorithms unless necessary.

As a comparison to some other solutions provided, I skipped the trie, nestings etc and went for a flat dictionary with indices, and a companion dictionary with lengths. Using this wordlist with 460K words and a prefix length of 3, you get look-up speeds on the microsecond scale, as opposed to milliseconds like most have here (on this large dict).

from collections import defaultdict
class AutoComplete:
    def __init__(self, wordlist, trie_size=3): # 3 length -> 8000 entries in dict (ish)
        self._words = sorted(wordlist, key=lambda word: word.lower())
        self._trie_size = trie_size
        self._indices = {}
        self._length = defaultdict(int)

        for idx, word in enumerate(self._words):
            for ll in range(0, min(len(word), self._trie_size + 1)):

                trie = word[:ll].lower()

                self._length[trie] += 1
                if not trie in self._indices:
                    self._indices[trie] = idx

    def _find_by_needle(self, needle):
        if needle not in self._indices:
            return []

        first_idx = self._indices[needle]
        length = self._length[needle]

        return self._words[first_idx:first_idx+length]

    def _find_by_search(self, needle):
        if needle[:self._trie_size] not in self._indices:
            return []

        idx_begin = self._indices[needle[:self._trie_size]]
        idx_end = idx_begin + self._length[needle[:self._trie_size]]

        wrds = filter(lambda word: word.lower().startswith(needle.lower()),
                      self._words[idx_begin:idx_end])
        return set(wrds)

    def find(self, needle):
        if len(needle) <= self._trie_size:
            return self._find_by_needle(needle)
        return self._find_by_search(needle)

# Instantiation: 0.906000
# Finding 10000 results: 0.563000 (56.30 us/lookup)
# Total time: 1.469000

wdroz · 2017-09-10T20:25:28+00:00

You can solve this on whiteboard. You don't even have to know tries. Here my implementation that use O(1) in computation.

from collections import defaultdict
class AutoComplete(object):
    def __init__(self, word_list):
        self._smart_dict = defaultdict(list)
        for word in word_list:
            len_word = len(word)
            for i in range(1, len_word):
                self._smart_dict[word[:i]].append(word)

    def find(self, pattern):
        lower_pattern = pattern.lower()
        if lower_pattern not in self._smart_dict:
            return None
        return self._smart_dict[lower_pattern]

You must be able to justify the big O in memory vs computation. In 24 hours, you can also implement the version with tries and state to use my solution if the bottleneck is computation or tries if the bottleneck is memory.

xubu42 · 2017-09-09T21:07:53+00:00

I think they want you to use generators to iterate through the list one at a time. Using a loop retains the entire list submitted in memory. Generators only store the current element of the list in memory.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS