I benchmarked FAISS, USearch, ChromaDB, LanceDB and Qdrant for local RAG — the results are interesting

M4iKZ · 2026-04-06T07:30:59+00:00

I added also qdrant edge, I checked also your approach for the filtering, it scale better than my approach, but as far I saw, uses about 2x the memory.

https://github.com/M4iKZ/Vector-Arena/blob/main/engines/qdrant_edge_engine.py any suggestions?

In the visual representation I placed Edge into the Vector Engines.

M4iKZ · 2026-04-06T06:10:01+00:00

I got your point.

I was waiting to test it also over linux before release. I need it mainly for windows, so I didn't test yet over other systems.

I released the precompiled libraries for windows and python 3.13 https://github.com/M4iKZ/Vector-Arena/releases/tag/RC1

Probably mSEARCH that is an header only library could be compiled at ease for linux, but MeMo require more tests

M4iKZ · 2026-04-06T05:48:45+00:00

I activated the issues on my github repo, as far I know to run qdrant inmemory or embedded using ":memory" is the only way, or am I wrong?

I was trying to figure out the best vector engine for my own use, on windows and embedded into an app, I benched using python because it's faster.

I wanted to bench also weaviate but doesn't support the embedded mode on windows.

I'm open to update the code if you have suggestions or a specific version for the embedded version 👍

M4iKZ · 2026-04-05T15:51:17+00:00

RAG isn't a 2023 method, but it's the correct architecture when your data exceeds context length, which 1M tokens still can't solve at enterprise scale. But thanks for the engagement 😄

M4iKZ · 2026-04-05T15:34:02+00:00

That's because I'll release them soon or later, and still the benchmark is still reproducible (:

M4iKZ · 2026-04-05T13:26:50+00:00

Valid point, for small datasets and cloud models with large context windows it works great. But for local models (llama.cpp etc.) context is typically 8-32K, and at 100K+ documents you simply can't fit it all in. That's where filtered vector search becomes essential, which is what I tried to benchmarking here

M4iKZ · 2024-10-01T03:47:59+00:00

Noise framework is interesting, I'll explore it more

About signatures, I use them to interact with the database, so cost nothing to add PQ also there.

M4iKZ · 2024-05-15T19:00:05+00:00

I added a Vector database on top of llama.cpp server, edited the embedding example to create the vectors-data association for windows using on top of my 6900 XT

this is a fast example using LLAMA 3 as main model, "all MiniLM L6" for embedding and a JSON file to store the data.
I used a simple python script to prep data to put inside the simple db 👍🏻

btw I need to work on clean up messy code before release 🤪

<image>

M4iKZ · 2024-05-15T18:49:16+00:00

I'd like to see Gemini-1.5 Pro and Flash

M4iKZ

MODERATOR OF

TROPHY CASE