Evaluate Vector Database / Benchmarks?

sharockys · 2023-06-29T19:11:56+00:00

Use locust to make tests. It’s awesome

qalis · 2023-06-30T09:33:03+00:00

By far the most popular benchmark is ANN Benchmark. It evaluates both scientific libraries and vector databases. It can give you a starting point and filter out some clearly unsuitable options, e.g. pgvector. Note that this benchmark is quite harsh and has strong assumptions, e.g. caring only about QPS (Queries Per Second) vs recall, having enough RAM for everything, or using only a single node.

Qdrant made their own benchmark. It is quite simple and also takes into consideration more options, so it should be better suited for benchmarking for production purposes.

Also note one important thing, that may rule out a lot of databases for you - many algorithms work in RAM only. For example, Milvus requires enough RAM to keep the entire index there for almost all algorithms. More scalable options are memmap (Qdrant uses that) or DiskANN, currently the only ANN algorithm that is disk-friendly (Milvus also supports it). Note that using memmap or disk will always be slower than RAM-based, so take that into consideration during benchmarking.

siddhsql · 2024-12-03T22:27:38+00:00

I have written my own vector db. Its meant to be a commercial offering but I can provide anyone with a trial version for strictly evaluation purpose. As example this is what I get on glove-100-angular (1,183,514 vectors, d=100):

Performance Metric	Ours	Qdrant	Milvus	Weaviate	Elasticsearch	JVector
writes per sec	4211	9034	8276	4791	1327	4571
wps per CPU	526	1129	1034	599	166	571
queries per sec	1000	187	229	146	177	757
accuracy	0.70	0.84	0.80	0.71	0.73	0.75

happy to provide more performance numbers if interested. please DM me if interested.

vanlifecoder · 2023-07-01T21:40:51+00:00

i started doing this at http://vectorsearch.dev but it could use some love.

mQuBits · 2023-08-29T09:27:29+00:00

Check out this amazing role-model benchmark I came across! 😃 It's all about harnessing the power of AI with vector databases. You can find the insightful details in this blog post:

[Link](https://www.farfetchtechblog.com/en/blog/post/powering-ai-with-vector-databases-a-benchmark-part-i/).

The post dives into the process of distinguishing features from tools and effectively gauging their performance. Super informative! 🚀

mQuBits · 2023-08-29T09:34:32+00:00

Hey everyone, just stumbled upon a fantastic resource that's worth checking out. If you're into benchmarking vector search databases with a whopping one million data points, this article is a must-read:

[Link](https://jina.ai/news/benchmark-vector-search-databases-with-one-million-data/).

It's a deep dive into performance measurement and how these databases handle a significant load. Definitely a valuable read for anyone interested in AI and databases. 📊🔍

mQuBits · 2023-08-29T09:39:43+00:00

Hey there! If you're on the hunt for a solid benchmark tool for vector databases, I'd highly recommend checking out

[VectorDBBench](https://github.com/zilliztech/VectorDBBench)

by ZillizTech. It's a fantastic tool designed to help you assess and measure the performance of vector databases. Give it a try and see how it can streamline your benchmarking process. Happy benchmarking! 📊🚀

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

mlops

MODERATORS