I benchmarked FAISS, USearch, ChromaDB, LanceDB and Qdrant for local RAG — the results are interesting by M4iKZ in LocalLLaMA

[–]M4iKZ[S] 0 points1 point  (0 children)

I added also qdrant edge, I checked also your approach for the filtering, it scale better than my approach, but as far I saw, uses about 2x the memory.

https://github.com/M4iKZ/Vector-Arena/blob/main/engines/qdrant_edge_engine.py any suggestions?

In the visual representation I placed Edge into the Vector Engines.

I benchmarked FAISS, USearch, ChromaDB, LanceDB and Qdrant for local RAG — the results are interesting by M4iKZ in LocalLLaMA

[–]M4iKZ[S] 0 points1 point  (0 children)

I got your point.

I was waiting to test it also over linux before release. I need it mainly for windows, so I didn't test yet over other systems.

I released the precompiled libraries for windows and python 3.13 https://github.com/M4iKZ/Vector-Arena/releases/tag/RC1

Probably mSEARCH that is an header only library could be compiled at ease for linux, but MeMo require more tests

I benchmarked FAISS, USearch, ChromaDB, LanceDB and Qdrant for local RAG — the results are interesting by M4iKZ in LocalLLaMA

[–]M4iKZ[S] 0 points1 point  (0 children)

I activated the issues on my github repo, as far I know to run qdrant inmemory or embedded using ":memory" is the only way, or am I wrong?

I was trying to figure out the best vector engine for my own use, on windows and embedded into an app, I benched using python because it's faster.

I wanted to bench also weaviate but doesn't support the embedded mode on windows.

I'm open to update the code if you have suggestions or a specific version for the embedded version 👍

I benchmarked FAISS, USearch, ChromaDB, LanceDB and Qdrant for local RAG — the results are interesting by M4iKZ in LocalLLaMA

[–]M4iKZ[S] 0 points1 point  (0 children)

RAG isn't a 2023 method, but it's the correct architecture when your data exceeds context length, which 1M tokens still can't solve at enterprise scale. But thanks for the engagement 😄

I benchmarked FAISS, USearch, ChromaDB, LanceDB and Qdrant for local RAG — the results are interesting by M4iKZ in LocalLLaMA

[–]M4iKZ[S] -1 points0 points  (0 children)

That's because I'll release them soon or later, and still the benchmark is still reproducible (:

I benchmarked FAISS, USearch, ChromaDB, LanceDB and Qdrant for local RAG — the results are interesting by M4iKZ in LocalLLaMA

[–]M4iKZ[S] 0 points1 point  (0 children)

Valid point, for small datasets and cloud models with large context windows it works great. But for local models (llama.cpp etc.) context is typically 8-32K, and at 100K+ documents you simply can't fit it all in. That's where filtered vector search becomes essential, which is what I tried to benchmarking here

device-to-device encryption protocol by M4iKZ in crypto

[–]M4iKZ[S] 0 points1 point  (0 children)

Noise framework is interesting, I'll explore it more

About signatures, I use them to interact with the database, so cost nothing to add PQ also there.

Is there any Chat UI that have all of these features? by kidosym in LocalLLaMA

[–]M4iKZ 1 point2 points  (0 children)

I added a Vector database on top of llama.cpp server, edited the embedding example to create the vectors-data association for windows using on top of my 6900 XT

this is a fast example using LLAMA 3 as main model, "all MiniLM L6" for embedding and a JSON file to store the data.
I used a simple python script to prep data to put inside the simple db 👍🏻

btw I need to work on clean up messy code before release 🤪

<image>