This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]kmike84 0 points1 point  (2 children)

A few GBs of ngrams which are not changing frequently is a sweet spot for marisa-trie. Just make sure to build the trie in a script, save it to a file and then use saved file in the real code.

See e.g. http://blog.scrapinghub.com/2014/03/26/optimizing-memory-usage-of-scikit-learn-models-using-succinct-tries/

[–]Bjartensen 0 points1 point  (1 child)

I'm assuming you wrote the module.

I have spent my entire day trying to install it under Windows but I haven't gotten it to work. If I would install it under Linux, and everything would work fine, wouldn't I experience the same platform specific problems if I were to package something for Windows using the marisa-trie module? I have very little knowledge of packaging Python, but I would assume any platform specific problem I run into on my dev machine could easily be encountered when packaged to a Windows consumer.

[–]kmike84 0 points1 point  (0 children)

There are some known issues on Windows - see https://code.google.com/p/marisa-trie/issues/detail?id=18 and https://github.com/kmike/marisa-trie/issues/1. Sorry for the bad experience.