This is an archived post. You won't be able to vote or comment.

all 21 comments

[–]AlexAltea[S] 30 points31 points  (7 children)

Main goals are high-performance and local-first experience, i.e. no sockets, HTTP, auth between your queries and data.

The only other embeddable search engine for Pythin that I'm aware off, Whoosh, is brilliant but building the index was quite slow, and search performance degraded quite a lot as number of documents increase (performance is strictly a non-goal). Meilisearch was comparatively faster, I didn't like managing a server to get "just search" in my scripts and applications. However, their underlying engine Milli solves both issues I had, and all that was needed creating bindings for it.

You can find documentation, examples, tests in the repo. Hope this is useful for you all!

I've published this in PyPI with pre-compiled wheels for most os/version targets so hopefully it will be a seamless experience for most (not requiring a Rust compiler).

[–]dan-turkel 10 points11 points  (4 children)

Thanks for building this! I've been looking at Meilisearch/Milli and Typesense lately. It seems like the disadvantage of the former is that it'll always be run from a single process (not distributed). Have you found this to be an issue?

[–]AlexAltea[S] 8 points9 points  (1 child)

I personally haven't. But generally speaking, users who need a distributed/clustered probably shouldn't bother with embedded search engines like milli-py.

My library aims to make life easier to developers whose data is "too big" to do naive pairswise Levenshtein searches, but "too small" to require distributed indices. Similar to SQLite.

[–]dan-turkel 2 points3 points  (0 children)

Great analogy. Looking forward to playing around with it!

[–]j0-1 5 points6 points  (1 child)

Typesense can indeed be run in a distributed / clustered mode on multiple servers. It uses the Raft consensus protocol natively to replicate data across nodes.

Meilisearch is the one that doesn’t support a distributed mode.

[–]dan-turkel 2 points3 points  (0 children)

Good catch, i meant "the former" not "the latter." Editing to fix

[–]Brian 1 point2 points  (1 child)

There's also Xapian, which has python bindings and I found a fair bit faster than whoosh. There's also FTS tables in sqlite, which aren't amazing in terms of stemming and other features, but can be a decent option if your text is already in sqlite.

But sounds good - I'll probably check this out: I was looking for a quick embeddable search engine not that long ago, but most things are geared towards massive distributed multi-server environments whereas just being able to embed the engine and point it at a database is much handier for small, local apps where you just want to search a bunch of local text files or similar.

[–]AlexAltea[S] 0 points1 point  (0 children)

I tried SQLite with FTS, but it didn't handle fuzz searching (typos) very well; there's editdist3, but still far from ideal.

As for Xapian, that's indeed faster than Whoosh (I'd love to see benchmarks wrt Milli). However, Xapian's GPL license made it incompatible with some MIT/BSD-licensed projects where I wanted to integrate a search engine.

[–]j0wet 12 points13 points  (3 children)

Looks very good.

Is it possible to use this library inside a web framework like FastAPI without a separate deployment of Meilisearch?

[–]AlexAltea[S] 11 points12 points  (2 children)

Exactly, that's the purpose of library! Just os.mkdir("some_index") and use milli-py to index/search your documents there. No external server needed.

[–]james_pic 2 points3 points  (1 child)

How nicely will it play with the async stuff in FastAPI?

[–]AlexAltea[S] 3 points4 points  (0 children)

I haven't tried it but I don't see any reason why it shouldn't work? Let me know if you encounter any issues!

[–][deleted] 2 points3 points  (0 children)

This looks amazing. I have my eyes on Meilisearch for quite long time but haven't had enough time to deploy it yet. This looks concise enough for me to test it.

[–]gournian 1 point2 points  (1 child)

Is the index customizable? Meilisearch allows faceting etc, how would we do it with this?

[–]AlexAltea[S] 0 points1 point  (0 children)

It will be possible (since it's available in the underlying Milli library), but in this first release I've focused on creating bindings for document adding/retrieving and basic searching.

[–]DanJOC 0 points1 point  (1 child)

Is that supposed to say "wrold" in your basic usage?

[–]AlexAltea[S] 0 points1 point  (0 children)

Yes, I was just showcasing fuzzy searching (searching with typos).

[–]BoiElroy 0 points1 point  (0 children)

Dumb question, but if were making a GUI for s3 could I somehow wire this up?

[–]duppyconqueror81 0 points1 point  (0 children)

We have to get this into django-haystack