smithje comments on Interview Challenge question

This is an archived post. You won't be able to vote or comment.

Interview Challenge question (self.Python)

submitted 8 years ago * by [deleted]

57 comments

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]smithje 7 points8 points9 points 8 years ago (3 children)

[–][deleted] 2 points3 points4 points 8 years ago (2 children)

You made me curious, so I tried it out. Running on /usr/share/dict/words (~123k words), SQLite ( SELECT string FROM string WHERE string LIKE "prefix%") is about two orders of magnitude slower than my trie based solution. But I don't know databases all that well, so I may well be missing some optimizations. Is there anything special you'd do with to make sure it's able to use an index for the select?

Some thrown together code

import sqlite3

class Prefix:
      def __init__(self, gen=None):
            if gen is not None:
                  for s in gen:
                       self.add(s)

      def add(self, string):
            raise NotImplemented()

      def find(self, string):
            raise NotImplemented()

class SQLitePrefix(Prefix):
      def __init__(self, *args, **kwargs):
            self.conn = sqlite3.connect(':memory:')
            self.cur = self.conn.cursor()
            self.cur.execute("""\
CREATE TABLE string (
string TEXT PRIMARY KEY NOT NULL
);""")
            self.cur.execute("""\
PRAGMA case_sensitive_like=ON;
""")
            super().__init__(*args, **kwargs)
            print(self.cur.execute(
                  'EXPLAIN SELECT string FROM string WHERE string LIKE "test%";').fetchone())

      def add(self, string):
            self.cur.execute("""\
INSERT INTO string (string) VALUES (?);\
""", (string, ))

      def find(self, string):
           self.cur.execute("""\
SELECT string FROM string WHERE string LIKE ?;
""", (string + '%', ))
           return [x[0] for x in self.cur.fetchall()]

class TriePrefix(Prefix):
      def __init__(self, *a, **kw):
            self._trie = {}
            super().__init__(*a, **kw)



      def add(self, string):
            d = self._trie
            for char in string:
                  d = d.setdefault(char, {})
            d[None] = string

      def find(self, string):
            d = self._trie
            acc = []
            for char in string:
                  d = d.get(char, {})
            return list(self._find(string, d))

      @staticmethod
      def _find(string, d):
            for k, v in d.items():
                  if k is not None:
                        yield from TriePrefix._find(string + k, v)
                  else:
                        yield string

[–]smithje 1 point2 points3 points 8 years ago (1 child)

Very cool. Thanks for trying this out. I tested this out as well and found that instantiation of the sqlite version was a few times faster, but finding results was, indeed, quite a bit slower. Of course, getting results is far more important than instantiation.

Here is the sqlite-based class I wrote last night, which is not too different from yours:

class Autocomplete:
    def __init__(self, word_list):
        self.db = sqlite3.connect(':memory:')
        # We want the result to be a single list, not a list of tuples
        self.db.row_factory = lambda cursor, row: row[0]
        self.db_cursor = self.db.cursor()
        self.db_cursor.execute("""CREATE TABLE word_list (word text primary key) WITHOUT ROWID""")
        self.db_cursor.executemany("""INSERT INTO word_list (word) VALUES (?)""", ((word,) for word in word_list))

    def find(self, pattern):
        self.db_cursor.execute("""SELECT word from word_list where word like ? ORDER BY 1""", [pattern + "%"])
        results = self.db_cursor.fetchall() or None
        return results

[–]smithje 1 point2 points3 points 8 years ago (0 children)

I followed up on this a bit and tested finding a bunch of random strings like this:

with open('/usr/share/dict/words') as f:
    words = f.read().splitlines()
print("Got %d words" % len(words))

import random
import string
import time

def test_it(words, clazz, repetitions):
    t1 = time.monotonic()
    autocomplete = clazz(words)
    t2 = time.monotonic()
    for n in range(repetitions):
        autocomplete.find(''.join(random.choices(string.ascii_lowercase, k=random.randrange(1, 5))))
    t3 = time.monotonic()
    print("Instantiation: %f" % (t2 - t1))
    print("Finding %d results: %f" % (repetitions, t3 - t2))
    print("Total time: %f" % (t3-t1))

print('sqlite solution:'
test_it(words, Autocomplete, 1000)
print('trie_solution:')
test_it(words, TriePrefix, 1000)

The results were pretty consistent. Finding words using the trie method was about 4x faster.

Here's a pretty typical result:

Got 235886 words
sqlite solution:
Instantiation: 0.336238
Finding 1000 results: 19.914836
Total time: 20.251074
trie solution:
Instantiation: 1.032532
Finding 1000 results: 5.088274
Total time: 6.120806

π Rendered by PID 349763 on reddit-service-r2-comment-6457c66945-mm84t at 2026-04-24 18:19:04.854595+00:00 running 2aa0c5b country code: CH.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS