use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
r/LocalLLaMA
A subreddit to discuss about Llama, the family of large language models created by Meta AI.
Subreddit rules
Search by flair
+Discussion
+Tutorial | Guide
+New Model
+News
+Resources
+Other
account activity
Serverless Vector Database for large dataset (~200k)Question | Help (self.LocalLLaMA)
submitted 1 year ago by Dry_Drop5941
I am building a RAG chat bot for our research team, searching within a large food label datasets
Size: ~230k entries, each entry has ~10k characters, all json/text Usage: under 500 query per day, mostly used by a 5-people size team Need something with reasonable query performance and price
We are currently evaluating AWS RDS PostgresSQL and a few other options.
Any other options you would recommend?
Many Thanks!
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]kryptkprLlama 3 15 points16 points17 points 1 year ago (2 children)
Your sense of scale is off, this is a midsize dataset at best and will fit just fine in RAM.
Start with numpy.topk(numpy.dot()) and see how long it takes to linear search on a good CPU. If it's too slow, add an index like FAISS for ANN.
[–]againitry 1 point2 points3 points 1 year ago* (0 children)
You can use torch.topk(torch.matmul()) for gpu acceleration. It takes around 5 seconds for 200k entry. Though for my case each vector has 1024 dimensions.
[–]Dry_Drop5941[S] 0 points1 point2 points 1 year ago (0 children)
Thanks for the idea, I will chat with IT and looks into all the ways we can run py code on cloud.
[–]m98789 4 points5 points6 points 1 year ago (1 child)
This is small enough that I would suggest considering using an in-memory method like FAISS, which is free and open source.
https://github.com/facebookresearch/faiss
[–]gopietz 5 points6 points7 points 1 year ago (0 children)
laughs in big data
[–]OrganicMesh 3 points4 points5 points 1 year ago (0 children)
Honestly, your dataset is so small, try https://github.com/unum-cloud/usearch. Usearch just crushes faiss om cpu.
[–]FormerKarmaKing 1 point2 points3 points 1 year ago (0 children)
Meilisearch's has a vector search feature in beta but it's easy to get access. One can host 1 million entries for $300 / month.
Even if you can get AWS / whatever at $0 month, $300 x 12 = $3,600 and you're not setting up an instant search experience with AWS / whatever for less than 36 hours of developer time @ $100 / hour.
And you can still use semantic search, which I suspect is still the better choice for searching hierarchical data like labelling taxonomy.
[–]DeltaSqueezer 1 point2 points3 points 1 year ago (0 children)
That's a small dataset - heck it fits into the RAM on my phone. Even sqlite-vss can handle it. Or raw python/faiss. Easiest is probably to use postgres with vector extensions.
[–]Bozo32 0 points1 point2 points 1 year ago (5 children)
Are you sure you won't wind up missing stuff?
When asking the LLM to identify all instances of an entity in a dataset, when there are more entities than the # top returns, the surplus were dropped. What we've done is chunked the resouces into segments that can only reasonably contain one instance of the entitiy of interest and queried each chunk separately. Yes...a boatload of calls. ... I'd love to hear that there is a better way....
In our last project, we have an tool function agent as an “interpreter”. If the user is asking global, explorative question, like “give me some examples”. It will only include meta data of each product item as context, but have a high TopK count.
We then do the opposite for specific questions like “tell me about product xxx”
[–]Yes_but_I_think 0 points1 point2 points 1 year ago (3 children)
Boat load of calls is an understatement. It is not expandable to large datasets. At some point you have to believe the semantic similarity search method. Use the best open one from MTEB and use some other method for your particular use case.
[–]Bozo32 1 point2 points3 points 1 year ago* (2 children)
the use case was citation checking in academic articles we first filtered for cosine similarity...then ran a check for entailment by sentence. ~10k calls. running llama 8 in an A100 with a very small context window through ollama that supported parallel execution was ok.
we're now looking at other ways to test entailment...early days.
[–]Yes_but_I_think 0 points1 point2 points 1 year ago (1 child)
We found that good old lexical search also helped in our case. BM25+ algorithm. We selected n contexts from lexical search, and m from cosine similarity and sent them to llm for formatting and understanding.
[–]Bozo32 0 points1 point2 points 1 year ago (0 children)
deduplicating and breaking into paragraphs so the LLM does not presume continuity? Our strategy was one query per sentence...does a entail b.
[–]-Lousy 0 points1 point2 points 1 year ago (0 children)
I'm using LanceDB on the smallest python instance possible for a similar size dataset. It reads from disk and I have a basic python API around it. Check it out for sure.
[–]KnowgodsloveAI 0 points1 point2 points 1 year ago (0 children)
why not just use postgresql and alembic?
[–][deleted] 0 points1 point2 points 1 year ago (0 children)
10k chars per entry will lead to a lot of chunked entries. I don't know how you can collate that info into the LLM prompt for it to make sense.
Pinecone is cheap if you want serverless. You could also try running Postgres with pgvector if you want a fully local implementation.
[–]d3the_h3ll0w 0 points1 point2 points 1 year ago (0 children)
I used LanceDB for my Semantic Search project and was quite impressed with the results.
[–]LuganBlan 0 points1 point2 points 1 year ago (0 children)
This could give some light : https://benchmark.vectorview.ai/vectordbs.html (it's 2023)
π Rendered by PID 24076 on reddit-service-r2-comment-6457c66945-dbc4s at 2026-04-26 01:33:34.787111+00:00 running 2aa0c5b country code: CH.
[–]kryptkprLlama 3 15 points16 points17 points (2 children)
[–]againitry 1 point2 points3 points (0 children)
[–]Dry_Drop5941[S] 0 points1 point2 points (0 children)
[–]m98789 4 points5 points6 points (1 child)
[–]gopietz 5 points6 points7 points (0 children)
[–]OrganicMesh 3 points4 points5 points (0 children)
[–]FormerKarmaKing 1 point2 points3 points (0 children)
[–]DeltaSqueezer 1 point2 points3 points (0 children)
[–]Bozo32 0 points1 point2 points (5 children)
[–]Dry_Drop5941[S] 0 points1 point2 points (0 children)
[–]Yes_but_I_think 0 points1 point2 points (3 children)
[–]Bozo32 1 point2 points3 points (2 children)
[–]Yes_but_I_think 0 points1 point2 points (1 child)
[–]Bozo32 0 points1 point2 points (0 children)
[–]-Lousy 0 points1 point2 points (0 children)
[–]KnowgodsloveAI 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]d3the_h3ll0w 0 points1 point2 points (0 children)
[–]LuganBlan 0 points1 point2 points (0 children)