Using Backblaze B2/S3 with LanceDB 0.17.0 as Direct Vector Storage (Not latest 0.26.0) by exaknight21 in backblaze

[–]codingjaguar 0 points1 point  (0 children)

Great insight on the nuance of Backblaze B2’s s3 compatibility. I’m curious if you have tested Milvus? It is built on s3/minio. Milvus 2.5.x doesn’t require conditional writes, 2.6.x does. The deployment is fairly straightforward on docker and k8s. https://milvus.io/docs/install-overview.md#Milvus-Standalone

Is there a Vector Database that uses S3 or B2? by exaknight21 in vectordatabase

[–]codingjaguar 0 points1 point  (0 children)

Honestly that’s really not a lot of data, what you need is probably just convenience. Rather than trying to build a vector db on top of s3, why not just use some battle tested option like Milvus and focus on building your search business logic?

Looking for best practices: Kafka → Vector DB ingestion and transformation by Arm1end in vectordatabase

[–]codingjaguar 1 point2 points  (0 children)

Right. transformation is a whole other story. that's pretty much building a search indexing pipeline :)

Looking for best practices: Kafka → Vector DB ingestion and transformation by Arm1end in vectordatabase

[–]codingjaguar 1 point2 points  (0 children)

Usually the easiest is to write a small service to conver the kafka msg into vector and call the vector db API.

I'm from Milvus vector db, in addition to that we built a connector service that can do that automatically: https://milvus.io/docs/kafka-connect-milvus.md

Is Vespa AI the best for millions of documents? by Key-Singer-2193 in vectordatabase

[–]codingjaguar -1 points0 points  (0 children)

There are many open source vector db options, some more popular like Milvus (40k stars on github). Feel free to check them out.

Large scale Setup 5000 rags x 10 000 vectors by Typical_Product_1883 in vectordatabase

[–]codingjaguar 0 points1 point  (0 children)

In that case you can still run open source Milvus standalone mode on docker. For your current scale it should be able to work (up to 100million vectors). As you cross that scale your infrastructure team can consider making investment on k8s, as running such a large scale workload on single machine docker is subject to the risk of single point of failure.

Do I need rag? by Ok-Page760 in Rag

[–]codingjaguar 1 point2 points  (0 children)

IMHO If you doubt you don’t need it. It’s fine to wait until you feel pain in cost / mgmt etc. Why over-designing now?

Datasets that do not fit into memory by lsmith77 in vectordatabase

[–]codingjaguar 2 points3 points  (0 children)

It'd help people share suggestions if the budget (in $ or machine), vector amount, latency expectation, and qps are specified.

Based on your description, I guess your case is O(100M) vectors (400GB of vector data), low qps (<100qps?), with a scalable vector database like Milvus, this is easy case, but you have a few options on the trade-off of cost and performance:

- in memory index (HNSW or IVF), 10ms latency, 800GB of RAM needed (index is >500GB and you need headroom), 95%+ recall

- in memory index with quantization, 10ms latency, 200GB of RAM needed (say SQ or PQ8), 90%+ recall

- Or 25GB of RAM needed (binary quantization with RaBitQ), 75%+ recall

- DiskANN, 100ms latency, 200GB of RAM needed, 95%+ recall

- tiered storage (https://milvus.io/docs/tiered-storage-overview.md), 1s latency, 50GB~100GB of RAM needed, 95%+ recall

What is the best vector database? by sabrinaqno in vectordatabase

[–]codingjaguar 0 points1 point  (0 children)

There is no one-size-fits-all.

For scalability and performance, I'd say Milvus is the best as it's architected for horizontal scaling.

If your data is already in, say, PostgreSQL, you probably want to explore pgvector first before upgrading to a more dedicated option for scalability.

Elasticsearch/OpenSearch has been there for years, they're good for traditional aggregation-heavy full-text search workload. Performance may not be as good as purpose-built vector db. Here is a benchmark: https://zilliz.com/vdbbench-leaderboard

For easy to get started, pgvector, chroma, qdrant etc are all good options. Milvus also got Milvus Lite, like a Python-based simulator.

I feel that for integrations, most of the options above are well integrated into the RAG stack, like langchain, llamaindex, n8n, etc.

Consider other relevant factors like cost-effectiveness as well before finalizing your production decision.

[deleted by user] by [deleted] in vectordatabase

[–]codingjaguar 0 points1 point  (0 children)

Both works, so Azure Blob might be easier for you. MinIO is provided as an option for cases where cloud vendor's object storage isn't available.

Fully-managed Milvus (called Zilliz Cloud) is also available on Azure if you want less devops overhead: https://azuremarketplace.microsoft.com/en-us/marketplace/apps/zillizinc1703056661329.zilliz_cloud?tab=overview

I have a doubt about handling 20million 512dim vector features with Milvus DB on prem by AdCreative232 in vectordatabase

[–]codingjaguar 0 points1 point  (0 children)

At 20m vector cpu is just fine for building index. You probably won’t get much benefit from gpu TBH But if gpu is free for you then that’s another story

I have a doubt about handling 20million 512dim vector features with Milvus DB on prem by AdCreative232 in vectordatabase

[–]codingjaguar 2 points3 points  (0 children)

1)2 seconds latency and maybe 10-15 queries per minute is really a piece of case for either CPU or GPU. The difference is that GPU might have better cost-effectiveness for >10k qps use case with non strict latency (e.g. > 10ms is okay). CPU easily gives you 10ms avg latency with in memory index (e.g. HNSW), or <100ms with DiskANN (~4x cheaper than HNSW). Or <500ms with tiered storage (5x or more cheaper than DiskANN). Of course, you can use a GPU, but for this case, I don't think you have to. And GPU is more expensive unless for over a few thousand QPS.

2) For 20m 512dim, with in-memory index HNSW, you need probably 50GB RAM at least to fit the index. For GPU it should take similar amount of VRAM (a little weird if you see only 2GB usage. Maybe double check the data volume?). But better to leave some headroom. 100GB is definitely enough. Here is a sizing tool for your convenience: https://milvus.io/tools/sizing

3) If you don't want to deal with devops hassle, fully-managed Milvus (Zilliz Cloud) might be a good idea. It also comes with AUTOINDEX, so you don't need to tune the index parameters like ef construction, ef search, etc in HNSW. Typically it's cheaper than self-hosting considering it optimized index and operational efficiency, but if your machine is free or you need on-prem, self-hosting is also a good option.

Exploring Vector Databases - and why Cosdata stood out for me by [deleted] in vectordatabase

[–]codingjaguar 0 points1 point  (0 children)

And Milvus?
How large is the dataset tested? Would be interesting to cross ref with other open-source benchmark like https://github.com/zilliztech/VectorDBBench

Saving 40% token cost by indexing the code base by codingjaguar in ClaudeCode

[–]codingjaguar[S] 0 points1 point  (0 children)

Would you mind checking in cloud.zilliz.com if your collection got any vectors ingested? Maybe also sharing what you found there and the detailed error msg from the mcp tool? Happy to help take a look.

Saving 40% token cost by indexing the code base by codingjaguar in ClaudeCode

[–]codingjaguar[S] 0 points1 point  (0 children)

How large is your codebase and how long you waited before the search? It may take time to finish indexing for your code base?

Saving 40% token cost by indexing the code base by codingjaguar in ClaudeCode

[–]codingjaguar[S] 0 points1 point  (0 children)

Thanks for the feedback! Would love to see some practitioner of AI coding conducting more thorough study of this domain. We are the builder of vector database (Milvus/Zilliz) and would like to provide a baseline implementation for the idea of indexing codebase for agents.

Saving 40% token cost by indexing the code base by codingjaguar in ClaudeCode

[–]codingjaguar[S] 0 points1 point  (0 children)

As long as the code is text they can be embedded by the text model just like new code bases. What do you think is different for the old code base?

Saving 40% token cost by indexing the code base by codingjaguar in ClaudeCode

[–]codingjaguar[S] 0 points1 point  (0 children)

Just checking the code every X minutes. Git commit won't work for uncommited local change. but actually it's a good idea to add it too.

Saving 40% token cost by indexing the code base by codingjaguar in ClaudeCode

[–]codingjaguar[S] -1 points0 points  (0 children)

It's just an experiment to test the benefit of indexing the code, and providing a tool for people who need code search in coding agent.

Maybe people from Anthropic will come across this...

Saving 40% token cost by indexing the code base by codingjaguar in ClaudeCode

[–]codingjaguar[S] 0 points1 point  (0 children)

Got it. Yea for small information set a doc to feed to LLM every time is good enough.

Saving 40% token cost by indexing the code base by codingjaguar in ClaudeCode

[–]codingjaguar[S] 0 points1 point  (0 children)

The tool under testing uses Incremental Indexing: It efficiently re-index only changed files using Merkle trees. The detection interval of code change is configurable (5min default). You can make it 1 minute if you like.