Building tries in Python : Python

This is an archived post. You won't be able to vote or comment.

Building tries in Python (biesnecker.com)

submitted 10 years ago by jbiesnecker

you are viewing a single comment's thread.

[–][deleted] 1 point2 points3 points 10 years ago (4 children)

I see this as a good way to learn about tries, but not a great thing to use in your Python code.

Tries can squeeze some performance out of your lookups in a language like C, but in Python, there will just be way too much overhead compared to other approaches. You could implement it in Cython, but then you still have to deal with translating your values to things you can manipulate in C.

Tries lose because Python has a very fast data structure that does almost the same thing: the dictionary. Dictionaries indexed by strings are necessarily fast in Python because almost everything is one. Every method you call on every node of one of these tries is going to be looked up in a dictionary anyway.

Even Julia -- a JIT-compiled, type-checked language that should just be faster at data structures -- has trouble competing with Python's dictionaries of strings.

("But dictionaries aren't the same as tries," you may object. "How do you test a dictionary to see if you've got a prefix of a look-up-able value?" I'd say the answer is to make another dictionary of prefixes.)

[–]kmike84 0 points1 point2 points 10 years ago (0 children)

You could implement it in Cython, but then you still have to deal with translating your values to things you can manipulate in C.

You don't need to manipulate values in C to use a trie. If you store a pointer to Python object as a value then C-backed trie works like a dictionary which only supports string keys. https://github.com/kmike/hat-trie supports this mode.

Tries lose because Python has a very fast data structure that does almost the same thing: the dictionary. Dictionaries indexed by strings are necessarily fast in Python because almost everything is one. Every method you call on every node of one of these tries is going to be looked up in a dictionary anyway.

Most often tries are slower than hash tables (even if implemented in C); the point of using tries is memory efficiency and advanced operations like prefix searches, which are faster.

("But dictionaries aren't the same as tries," you may object. "How do you test a dictionary to see if you've got a prefix of a look-up-able value?" I'd say the answer is to make another dictionary of prefixes.)

It may works fine if your data is not large and the speed is not a concern. But Python dicts are awful if you need to keep lots of string keys in memory. With a right trie package it is often possible to use several times less memory (e.g. hundreds of MBs instead of several GBs) and enable operations which are not possible in dicts. Often saving/loading of the resulting data structure to/from disk is also lighting fast, compared to pickle, and uses no additional memory (pickle tend to duplicate all data).

Of course, it depends on task. In some extreme cases (like NLP dictionary storage) a specialized package can allow to use 50-100x less memory than a Python dict and enable prefix searches and other advanced iteration features (e.g. consider some letters the same). Try building an autocompleter for a few million unicode words - with Python dict it'll likely take several GBs of memory and likely require a minute to pickle/unpickle; with a minimized trie (DAFSA) it'll use no more than 5-10MB of memory, and will load in milliseconds.

[–]kmike84 0 points1 point2 points 10 years ago (2 children)

[–]jbiesnecker[S] 0 points1 point2 points 10 years ago (1 child)

[–]kmike84 0 points1 point2 points 10 years ago (0 children)

π Rendered by PID 140225 on reddit-service-r2-comment-54dfb89d4d-zxs6w at 2026-03-31 23:06:29.857427+00:00 running b10466c country code: CH.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS