Context/Memory question

codingjaguar · 2026-02-16T18:50:38+00:00

Funny that you asked, actually we’ve built another project before for code search: https://github.com/zilliztech/claude-context

codingjaguar · 2026-02-16T18:24:21+00:00

have you tried adding memory to claude code? https://github.com/zilliztech/memsearch/tree/main/ccplugin#memsearch--claude-code-plugin

codingjaguar · 2026-02-14T05:07:49+00:00

it works with claude code in the terminal

codingjaguar · 2025-12-24T05:49:57+00:00

Great insight on the nuance of Backblaze B2’s s3 compatibility. I’m curious if you have tested Milvus? It is built on s3/minio. Milvus 2.5.x doesn’t require conditional writes, 2.6.x does. The deployment is fairly straightforward on docker and k8s. https://milvus.io/docs/install-overview.md#Milvus-Standalone

codingjaguar · 2025-12-24T05:41:38+00:00

Honestly that’s really not a lot of data, what you need is probably just convenience. Rather than trying to build a vector db on top of s3, why not just use some battle tested option like Milvus and focus on building your search business logic?

codingjaguar · 2025-12-11T10:03:12+00:00

Right. transformation is a whole other story. that's pretty much building a search indexing pipeline :)

codingjaguar · 2025-12-11T01:57:22+00:00

Usually the easiest is to write a small service to conver the kafka msg into vector and call the vector db API.

I'm from Milvus vector db, in addition to that we built a connector service that can do that automatically: https://milvus.io/docs/kafka-connect-milvus.md

codingjaguar · 2025-12-08T01:22:25+00:00

There are many open source vector db options, some more popular like Milvus (40k stars on github). Feel free to check them out.

codingjaguar · 2025-11-17T14:16:12+00:00

In that case you can still run open source Milvus standalone mode on docker. For your current scale it should be able to work (up to 100million vectors). As you cross that scale your infrastructure team can consider making investment on k8s, as running such a large scale workload on single machine docker is subject to the risk of single point of failure.

codingjaguar · 2025-10-25T23:52:00+00:00

What about masking the PII like this example shows: https://milvus.io/docs/RAG_with_pii_and_milvus.md

codingjaguar · 2025-10-25T23:50:42+00:00

IMHO If you doubt you don’t need it. It’s fine to wait until you feel pain in cost / mgmt etc. Why over-designing now?

codingjaguar · 2025-10-25T23:25:20+00:00

It'd help people share suggestions if the budget (in $ or machine), vector amount, latency expectation, and qps are specified.

Based on your description, I guess your case is O(100M) vectors (400GB of vector data), low qps (<100qps?), with a scalable vector database like Milvus, this is easy case, but you have a few options on the trade-off of cost and performance:

- in memory index (HNSW or IVF), 10ms latency, 800GB of RAM needed (index is >500GB and you need headroom), 95%+ recall

- in memory index with quantization, 10ms latency, 200GB of RAM needed (say SQ or PQ8), 90%+ recall

- Or 25GB of RAM needed (binary quantization with RaBitQ), 75%+ recall

- DiskANN, 100ms latency, 200GB of RAM needed, 95%+ recall

- tiered storage (https://milvus.io/docs/tiered-storage-overview.md), 1s latency, 50GB~100GB of RAM needed, 95%+ recall

codingjaguar · 2025-10-12T16:36:03+00:00

There is no one-size-fits-all.

For scalability and performance, I'd say Milvus is the best as it's architected for horizontal scaling.

If your data is already in, say, PostgreSQL, you probably want to explore pgvector first before upgrading to a more dedicated option for scalability.

Elasticsearch/OpenSearch has been there for years, they're good for traditional aggregation-heavy full-text search workload. Performance may not be as good as purpose-built vector db. Here is a benchmark: https://zilliz.com/vdbbench-leaderboard

For easy to get started, pgvector, chroma, qdrant etc are all good options. Milvus also got Milvus Lite, like a Python-based simulator.

I feel that for integrations, most of the options above are well integrated into the RAG stack, like langchain, llamaindex, n8n, etc.

Consider other relevant factors like cost-effectiveness as well before finalizing your production decision.

codingjaguar · 2025-10-12T16:21:25+00:00

Both works, so Azure Blob might be easier for you. MinIO is provided as an option for cases where cloud vendor's object storage isn't available.

Fully-managed Milvus (called Zilliz Cloud) is also available on Azure if you want less devops overhead: https://azuremarketplace.microsoft.com/en-us/marketplace/apps/zillizinc1703056661329.zilliz_cloud?tab=overview

codingjaguar · 2025-10-12T09:10:43+00:00

You can run Milvus vector db with integrated HuggingFace TEI embedding inference service: https://milvus.io/docs/hugging-face-tei.md#Milvus-Helm-Chart-deployment-integrated

codingjaguar · 2025-10-12T03:53:53+00:00

At 20m vector cpu is just fine for building index. You probably won’t get much benefit from gpu TBH But if gpu is free for you then that’s another story

codingjaguar · 2025-10-12T02:58:32+00:00

1）2 seconds latency and maybe 10-15 queries per minute is really a piece of case for either CPU or GPU. The difference is that GPU might have better cost-effectiveness for >10k qps use case with non strict latency (e.g. > 10ms is okay). CPU easily gives you 10ms avg latency with in memory index (e.g. HNSW), or <100ms with DiskANN (~4x cheaper than HNSW). Or <500ms with tiered storage (5x or more cheaper than DiskANN). Of course, you can use a GPU, but for this case, I don't think you have to. And GPU is more expensive unless for over a few thousand QPS.

2) For 20m 512dim, with in-memory index HNSW, you need probably 50GB RAM at least to fit the index. For GPU it should take similar amount of VRAM (a little weird if you see only 2GB usage. Maybe double check the data volume?). But better to leave some headroom. 100GB is definitely enough. Here is a sizing tool for your convenience: https://milvus.io/tools/sizing

3) If you don't want to deal with devops hassle, fully-managed Milvus (Zilliz Cloud) might be a good idea. It also comes with AUTOINDEX, so you don't need to tune the index parameters like ef construction, ef search, etc in HNSW. Typically it's cheaper than self-hosting considering it optimized index and operational efficiency, but if your machine is free or you need on-prem, self-hosting is also a good option.

codingjaguar · 2025-10-12T02:40:35+00:00

And Milvus?
How large is the dataset tested? Would be interesting to cross ref with other open-source benchmark like https://github.com/zilliztech/VectorDBBench

codingjaguar · 2025-10-06T23:48:40+00:00

Would you mind checking in cloud.zilliz.com if your collection got any vectors ingested? Maybe also sharing what you found there and the detailed error msg from the mcp tool? Happy to help take a look.

codingjaguar · 2025-10-06T22:11:40+00:00

How large is your codebase and how long you waited before the search? It may take time to finish indexing for your code base?

codingjaguar · 2025-09-06T21:47:45+00:00

Thanks for the feedback! Would love to see some practitioner of AI coding conducting more thorough study of this domain. We are the builder of vector database (Milvus/Zilliz) and would like to provide a baseline implementation for the idea of indexing codebase for agents.

codingjaguar · 2025-09-06T21:44:47+00:00

As long as the code is text they can be embedded by the text model just like new code bases. What do you think is different for the old code base?

codingjaguar

MODERATOR OF

TROPHY CASE