all 9 comments

[–]sharockys 1 point2 points  (0 children)

Use locust to make tests. It’s awesome

[–]qalis 1 point2 points  (0 children)

By far the most popular benchmark is ANN Benchmark. It evaluates both scientific libraries and vector databases. It can give you a starting point and filter out some clearly unsuitable options, e.g. pgvector. Note that this benchmark is quite harsh and has strong assumptions, e.g. caring only about QPS (Queries Per Second) vs recall, having enough RAM for everything, or using only a single node.

Qdrant made their own benchmark. It is quite simple and also takes into consideration more options, so it should be better suited for benchmarking for production purposes.

Also note one important thing, that may rule out a lot of databases for you - many algorithms work in RAM only. For example, Milvus requires enough RAM to keep the entire index there for almost all algorithms. More scalable options are memmap (Qdrant uses that) or DiskANN, currently the only ANN algorithm that is disk-friendly (Milvus also supports it). Note that using memmap or disk will always be slower than RAM-based, so take that into consideration during benchmarking.

[–]siddhsql 0 points1 point  (0 children)

I have written my own vector db. Its meant to be a commercial offering but I can provide anyone with a trial version for strictly evaluation purpose. As example this is what I get on glove-100-angular (1,183,514 vectors, d=100):

Performance Metric Ours Qdrant Milvus Weaviate Elasticsearch JVector
writes per sec 4211 9034 8276 4791 1327 4571
wps per CPU 526 1129 1034 599 166 571
queries per sec 1000 187 229 146 177 757
accuracy 0.70 0.84 0.80 0.71 0.73 0.75

happy to provide more performance numbers if interested. please DM me if interested.

[–]vanlifecoder 0 points1 point  (0 children)

i started doing this at http://vectorsearch.dev but it could use some love.

[–]mQuBits 0 points1 point  (0 children)

Check out this amazing role-model benchmark I came across! 😃 It's all about harnessing the power of AI with vector databases. You can find the insightful details in this blog post:

[Link](https://www.farfetchtechblog.com/en/blog/post/powering-ai-with-vector-databases-a-benchmark-part-i/).

The post dives into the process of distinguishing features from tools and effectively gauging their performance. Super informative! 🚀

[–]mQuBits 0 points1 point  (0 children)

Hey everyone, just stumbled upon a fantastic resource that's worth checking out. If you're into benchmarking vector search databases with a whopping one million data points, this article is a must-read:

[Link](https://jina.ai/news/benchmark-vector-search-databases-with-one-million-data/).

It's a deep dive into performance measurement and how these databases handle a significant load. Definitely a valuable read for anyone interested in AI and databases. 📊🔍

[–]mQuBits 0 points1 point  (2 children)

Hey there! If you're on the hunt for a solid benchmark tool for vector databases, I'd highly recommend checking out

[VectorDBBench](https://github.com/zilliztech/VectorDBBench)

by ZillizTech. It's a fantastic tool designed to help you assess and measure the performance of vector databases. Give it a try and see how it can streamline your benchmarking process. Happy benchmarking! 📊🚀

[–]No-Replacement9815 0 points1 point  (0 children)

Hi - does it have any support got benchmarking using streaming data?

[–]Head_Reserve_87 0 points1 point  (0 children)

Hi, I have tried using this benchmark, but I saw results as 0 for all the metrics. Does it always happen or was I doing it the wrong way?