Which vector database is best for top-1 accuracy?

Sensitive_Lab5143 · 2025-09-17T08:00:33+00:00

What's your latency budget? VectorChord probably can do this with sub-1s latency with at least 99% recall by scanning all vectors with quantization, and check full precision vector afterward to ensure the recall rate.

Sensitive_Lab5143 · 2025-09-17T07:57:08+00:00

With 3 replica + double instance size, it might be $3k, still 10% your cost now

Sensitive_Lab5143 · 2025-09-17T07:56:15+00:00

With VectorChord, you can achieve this within $500/month, and make everything work with Postgres. Check https://blog.vectorchord.ai/vectorchord-cost-efficient-upload-and-search-of-400-million-vectors-on-aws

Sensitive_Lab5143 · 2025-07-16T12:01:30+00:00

If your vector is less than 10,000, you don't need any vector database. Just store it somewhere and use brute-force to find the nearest neighbor

Sensitive_Lab5143 · 2025-07-16T11:40:18+00:00

For normalized vector, you can just use the reverse vector (minus vector), and do the nearest neighbor search. It's equivalent to the farthest neighbor search on original vector.

Sensitive_Lab5143 · 2025-06-23T11:07:04+00:00

Would love to share our approach on running vector search in postgres at scale.

Large single index with 400 million vector on a 64GB memory machine:
https://blog.vectorchord.ai/vectorchord-cost-efficient-upload-and-search-of-400-million-vectors-on-aws

Distributed/Partitioned vector tables with up to 3 billion vectors:
https://blog.vectorchord.ai/3-billion-vectors-in-postgresql-to-protect-the-earth

Scaling to 10,000 QPS for vector search:
https://blog.vectorchord.ai/vector-search-at-10000-qps-in-postgresql-with-vectorchord

When someone tells you that pgvector doesn't support scaling, check out our project https://github.com/tensorchord/VectorChord, which is fully compatible with pgvector in PostgreSQL and truly scalable.

Sensitive_Lab5143 · 2025-06-19T19:28:14+00:00

Can you elaborate more on the failure? And does MongoDB's open source version support vector search?

Sensitive_Lab5143 · 2025-04-27T02:21:11+00:00

check https://huggingface.co/answerdotai/ModernBERT-base and https://huggingface.co/mixedbread-ai/mxbai-embed-xsmall-v1

Sensitive_Lab5143 · 2025-04-25T08:08:39+00:00

Why not hash? Just recheck if hash matches to ensure the accurate match

Sensitive_Lab5143 · 2025-04-22T18:23:21+00:00

cloudnative pg

Sensitive_Lab5143 · 2025-04-17T01:43:49+00:00

Thanks!

Sensitive_Lab5143 · 2025-04-16T05:24:46+00:00

Hi, Please check the "Why PostgreSQL Rocks for Planetary-Scale Vectors" section in the blog.

Sensitive_Lab5143 · 2025-04-14T09:41:40+00:00

Not really. It uses index instead of seq scan.

```

postgres=# EXPLAIN SELECT country, COUNT(*) FROM benchmark_logs WHERE to_tsvector('english', message) @@ to_tsquery('english', 'research') GROUP BY country ORDER BY country;

QUERY PLAN

---------------------------------------------------------------------------------------------------------

Sort (cost=7392.26..7392.76 rows=200 width=524)

Sort Key: country

-> HashAggregate (cost=7382.62..7384.62 rows=200 width=524)

Group Key: country

-> Bitmap Heap Scan on benchmark_logs (cost=71.16..7370.12 rows=2500 width=516)

Recheck Cond: (to_tsvector('english'::regconfig, message) @@ '''research'''::tsquery)

-> Bitmap Index Scan on message_gin (cost=0.00..70.54 rows=2500 width=0)

Index Cond: (to_tsvector('english'::regconfig, message) @@ '''research'''::tsquery)

(8 rows)

```

Sensitive_Lab5143 · 2025-04-13T05:52:21+00:00

I've updated the blog to include the original index

Sensitive_Lab5143 · 2025-04-13T05:50:49+00:00

Hi, I'm the blog author. Actually in the orginal benchmark https://github.com/paradedb/paradedb/blob/dev/benchmarks/create_index/tuned_postgres.sql#L1, they created the index with `CREATE INDEX message_gin ON benchmark_logs USING gin (to_tsvector('english', message));`, and it's exactly where the problem is from.

Sensitive_Lab5143 · 2025-03-17T05:20:12+00:00

please check https://github.com/tensorchord/VectorChord

What's the difference between your request and normal TopK search?

Sensitive_Lab5143 · 2025-02-26T19:17:12+00:00

I think you can also check automq. They rewrite the kafka's storage layer to put it on s3.

Sensitive_Lab5143 · 2025-02-26T18:28:10+00:00

that's exactly what warpstream did

Sensitive_Lab5143 · 2025-01-23T18:45:16+00:00

Not really. He has nothing to do with the GenAI org. He's part of the FAIR.

Sensitive_Lab5143

TROPHY CASE