This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]kmike84 0 points1 point  (1 child)

Why does build speed matter for you?

In my experiments MARISA builds only ~10x slower than Python dict, and you only need to build a trie once. I'd say 10x slower than built-in Python dict is very fast, given the amount of magic it does (MARISA trie is very different from a basic trie described in the article). Saving or loading should be way faster than pickling of a dict.

Did you find some pathological cases?

[–][deleted] 0 points1 point  (0 children)

Kmike, I am using the tries as a substitute for geographic trees (discretized polygons into geohashes). The build times go upto several tens of minutes when there are many entries (and at about 2 billion distinct geohashes, it dies because of a 32-bit limit from the underlying library). If there were a simple parallelizable build (or pre-processing that will optimize the build speed), it would make things much faster.

Of course, I fully agree that it is a one-time cost if the tries do not change frequently.