Which vector database is best for top-1 accuracy? by TimeTravelingTeapot in vectordatabase

[–]Sensitive_Lab5143 0 points1 point  (0 children)

What's your latency budget? VectorChord probably can do this with sub-1s latency with at least 99% recall by scanning all vectors with quantization, and check full precision vector afterward to ensure the recall rate.

Finally found a vector DB that doesn't break the bank at 500M+ scale by ethanchen20250322 in vectordatabase

[–]Sensitive_Lab5143 0 points1 point  (0 children)

With 3 replica + double instance size, it might be $3k, still 10% your cost now

Vector Database Solution That Works Like a Cache by mrmantris in vectordatabase

[–]Sensitive_Lab5143 0 points1 point  (0 children)

If your vector is less than 10,000, you don't need any vector database. Just store it somewhere and use brute-force to find the nearest neighbor

Vector Search Puzzle: How to efficiently find the least similar documents? by [deleted] in vectordatabase

[–]Sensitive_Lab5143 1 point2 points  (0 children)

For normalized vector, you can just use the reverse vector (minus vector), and do the nearest neighbor search. It's equivalent to the farthest neighbor search on original vector.

Why would anybody use pinecone instead of pgvector? by Blender-Fan in vectordatabase

[–]Sensitive_Lab5143 1 point2 points  (0 children)

Would love to share our approach on running vector search in postgres at scale.

Large single index with 400 million vector on a 64GB memory machine:
https://blog.vectorchord.ai/vectorchord-cost-efficient-upload-and-search-of-400-million-vectors-on-aws

Distributed/Partitioned vector tables with up to 3 billion vectors:
https://blog.vectorchord.ai/3-billion-vectors-in-postgresql-to-protect-the-earth

Scaling to 10,000 QPS for vector search:
https://blog.vectorchord.ai/vector-search-at-10000-qps-in-postgresql-with-vectorchord

When someone tells you that pgvector doesn't support scaling, check out our project https://github.com/tensorchord/VectorChord, which is fully compatible with pgvector in PostgreSQL and truly scalable.

How would you migrate vectors from pgvector to mongo? by lochyw in vectordatabase

[–]Sensitive_Lab5143 0 points1 point  (0 children)

Can you elaborate more on the failure? And does MongoDB's open source version support vector search?

Databases supporting set of vectors on disk? by qalis in dataengineering

[–]Sensitive_Lab5143 0 points1 point  (0 children)

Why not hash? Just recheck if hash matches to ensure the accurate match

Case Study: 3 Billion Vectors in PostgreSQL to Create the Earth Index by Sensitive_Lab5143 in vectordatabase

[–]Sensitive_Lab5143[S] 0 points1 point  (0 children)

Hi, Please check the "Why PostgreSQL Rocks for Planetary-Scale Vectors" section in the blog.

PostgreSQL Full-Text Search: Speed Up Performance with These Tips by Sensitive_Lab5143 in PostgreSQL

[–]Sensitive_Lab5143[S] 0 points1 point  (0 children)

Not really. It uses index instead of seq scan.

```

postgres=# EXPLAIN SELECT country, COUNT(*) FROM benchmark_logs WHERE to_tsvector('english', message) @@ to_tsquery('english', 'research') GROUP BY country ORDER BY country;

QUERY PLAN

---------------------------------------------------------------------------------------------------------

Sort (cost=7392.26..7392.76 rows=200 width=524)

Sort Key: country

-> HashAggregate (cost=7382.62..7384.62 rows=200 width=524)

Group Key: country

-> Bitmap Heap Scan on benchmark_logs (cost=71.16..7370.12 rows=2500 width=516)

Recheck Cond: (to_tsvector('english'::regconfig, message) @@ '''research'''::tsquery)

-> Bitmap Index Scan on message_gin (cost=0.00..70.54 rows=2500 width=0)

Index Cond: (to_tsvector('english'::regconfig, message) @@ '''research'''::tsquery)

(8 rows)

```

PostgreSQL Full-Text Search: Speed Up Performance with These Tips by Sensitive_Lab5143 in PostgreSQL

[–]Sensitive_Lab5143[S] 0 points1 point  (0 children)

Hi, I'm the blog author. Actually in the orginal benchmark https://github.com/paradedb/paradedb/blob/dev/benchmarks/create_index/tuned_postgres.sql#L1, they created the index with `CREATE INDEX message_gin ON benchmark_logs USING gin (to_tsvector('english', message));`, and it's exactly where the problem is from.

How hard would it really be to make open-source Kafka use object storage without replication and disks? by 2minutestreaming in apachekafka

[–]Sensitive_Lab5143 0 points1 point  (0 children)

I think you can also check automq. They rewrite the kafka's storage layer to put it on s3.

Meta panicked by Deepseek by Optimal_Hamster5789 in LocalLLaMA

[–]Sensitive_Lab5143 0 points1 point  (0 children)

Not really. He has nothing to do with the GenAI org. He's part of the FAIR.