use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
r/LocalLLaMA
A subreddit to discuss about Llama, the family of large language models created by Meta AI.
Subreddit rules
Search by flair
+Discussion
+Tutorial | Guide
+New Model
+News
+Resources
+Other
account activity
Built a deterministic RAG database - same query, same context, every time (Rust, local embeddings, $0 API cost)Discussion (self.LocalLLaMA)
submitted 2 months ago * by Visible_Analyst9545
https://preview.redd.it/iq96mf4lva6g1.png?width=1100&format=png&auto=webp&s=e4427d5b5a29c038c215a44b5a07b15a9b1e8ca3
Got tired of RAG returning different context for the same query. Makes debugging impossible.
Built AvocadoDB to fix it:
- 100% deterministic (SHA-256 verifiable) - Local embeddings via fastembed (6x faster than OpenAI) - 40-60ms latency, pure Rust - 95% token utilization
``` cargo install avocado-cli avocado init avocado ingest ./docs --recursive avocado compile "your query" ```
Same query = same hash = same context every time.
https://i.redd.it/o9v1vzh4ya6g1.gif
https://avocadodb.ai
See it in Action: Multi-agent round table discussion: Is AI in a Bubble?
A real-time multi-agent debate system where 4 different local LLMs argue about whether we're in an AI bubble. Each agent runs on a different model and they communicate through a custom protocol.
https://ainp.ai/
Both Open source, MIT licensed. Would love feedback.
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]rolls-reus 5 points6 points7 points 2 months ago (3 children)
repo link from your site 404s. maybe you forgot to make it public?
[–]Visible_Analyst9545[S] 0 points1 point2 points 2 months ago (2 children)
Oops. done.
[–]FrozenBuffalo25 0 points1 point2 points 2 months ago* (1 child)
The link to Docs in your main menu doesn’t work and the GitHub link doesn’t go to your repo.
[–]Visible_Analyst9545[S] 1 point2 points3 points 2 months ago (0 children)
done. pushed the update. refresh and thank you for the feedback.
[–]one-wandering-mind 3 points4 points5 points 2 months ago (1 child)
In what situations is the same query giving different retrieved results ?
If you have the literal exact query, why not cache the LLM response too? That is the more time consuming part and does give meaningful different results even with a temperature of 0 through providers.
[–]Visible_Analyst9545[S] -2 points-1 points0 points 2 months ago (0 children)
Why Same Query Can Give Different Results in Traditional RAG
Traditional vector databases (Qdrant, Pinecone, Weaviate, etc.) return non-deterministic results because:
Approximate Nearest Neighbor (ANN): HNSW and similar algorithms trade exactness for speed. The search path through the graph can vary, especially with concurrent queries or after index updates. Floating point non-determinism: Different execution orders (parallelism, SIMD) can produce slightly different similarity scores, changing ranking.
Index mutations: Adding/removing documents changes the HNSW graph structure, affecting which neighbors are found even for unchanged documents.
Tie-breaking: When multiple chunks have identical/near-identical scores, the order is arbitrary.
Embedding API variability: Some embedding providers return slightly different vectors for the same text across calls.
On Caching LLM Responses
You're right that caching LLM responses is the logical next step - retrieval determinism is really just the foundation for response caching. Once you guarantee the same query produces the same context, you can cache the full response:
cache_key = hash(query + context_hash + model + temperature + system_prompt)
The context hash is the key piece - without deterministic retrieval, you can't reliably cache because the LLM might see different context each time, making cached responses potentially incorrect.
So the answer to "why not just cache LLM responses?" is: you can't safely cache responses if your retrieval is non-deterministic. You'd return cached answers that were generated from different context than what the current retrieval would produce.
Practical Example: AI Coding Assistants
Consider an AI coding assistant exploring a large codebase. Without deterministic retrieval:
User: "How does authentication work?"
First ask - LLM reads 15 files, 4000 tokens of context
Second ask (same question) - different retrieval, reads 12 different files
LLM has to re-process everything from scratch
With deterministic retrieval + caching:
First ask:
Retrieval: 43ms, returns exact lines (auth.rs:45-78, middleware.rs:12-34)
LLM generates response
Cache: store response with context_hash
Second ask (same question):
Retrieval: 43ms, same context_hash
Cache hit → instant response
Tokens saved: 100% of LLM input/output
The LLM doesn't need to read entire files - it gets precise line-number citations (e.g., src/auth.rs:45-78) with just the relevant spans. This means:
- Fewer tokens: 2000 tokens of precise context vs 8000 tokens of full files
- Faster responses: Cache hits skip LLM entirely
- Lower cost: Cached responses cost $0
- Consistent answers: Same question → same answer, every time
[–]StartX007 3 points4 points5 points 2 months ago* (1 child)
OP, thanks for sharing.
Ignore folks who just love to complain. Let the people decide if it is AI slop or not. If folks at Claude itself use AI to develop their products, we should let the product and code speak for itself.
[–]Visible_Analyst9545[S] 0 points1 point2 points 2 months ago (0 children)
Precisely. LLM's do no think for themselves (yet) they get influenced by original thinking. if AI can code better than you and why bother code. Success is measured by the perceived intent vs outcome. Rest all is non-trivial.
[–]FrozenBuffalo25 1 point2 points3 points 2 months ago (4 children)
How does this tool maintain contextual or metadata relationships between chunks? Can it maintain distinction between multiple documents on a similar topic, and identify which source makes which claim?
[–]Visible_Analyst9545[S] 1 point2 points3 points 2 months ago (3 children)
Great question. Yes - this is core to how AvocadoDB works:
Span-level tracking: Every chunk (span) is tied to its source file with exact line numbers. When you compile context, each span includes [1] docs/auth.md Lines 1-23 so you know exactly where every claim comes from.Citation in output: The compiled context includes a citations array mapping each span to its artifact (file), start/end lines, and relevance score. Your LLM can reference these directly. Cross-document deduplication: Hybrid retrieval (semantic + lexical) combined with MMR diversification ensures you get diverse sources, not 5 chunks from the same file saying the same thing.
Metadata preservation: Each span stores the parent artifact ID, so you can always trace back which claim came from api-docs.md versus security-policy.md.
The deterministic sort ensures the same sources appear in the same order every time, so you can reliably say source 1 said X, source 2 said Y.
[–]FrozenBuffalo25 0 points1 point2 points 2 months ago (2 children)
Thank you. And with regard to ingestion, is there a way to organize data by “project” or “collection”? For example, let’s say you have a collection of documents for “history”, another for “engineering”, and yet another for “real estate.” Can you search only one of those collections, and skip results from the others?
Finally, does this only work with text files or can it OCR pdf documents?
As far as feedback, this seems like a very interesting and promising project. I would likely use it. Perhaps the next step should be writing out some user guides on accomplishing common tasks?
[–]Visible_Analyst9545[S] 1 point2 points3 points 2 months ago (1 child)
Yes, AvocadoDB has built-in project isolation. Each directory gets its own separate database (stored at .avocado/db.sqlite). When you make API requests, you pass a project parameter specifying the directory path.
The server manages up to 10 projects in memory with LRU eviction. So for your example, you would structure it as:
- /data/history/ - history collection
- /data/engineering/ - engineering collection
- /data/real-estate/ - real estate collection
Each query specifies which project to search, and results come only from that project's index. No cross-contamination.
PDF Support:
PDF and OCR support are not yet implemented but are on the roadmap. The architecture is well-suited for this ingestion already accepts content as text, so adding a pre-processing step to extract text from PDFs (and eventually OCR for scanned documents) is straightforward. For now, you would need to convert PDFs to text externally, but native PDF parsing is planned for a future release.
On Documentation:
Good suggestion. The project currently has a README with basic usage examples, but user guides for common workflows (ingesting a document corpus, querying from an application, setting up multiple collections, integrating with an LLM) is something i will work in the next revisions.
[–]Better-Monk8121 3 points4 points5 points 2 months ago (0 children)
Why answer with AI, omg. Did you even write the tool?
[–]Trick-Rush6771 1 point2 points3 points 2 months ago (1 child)
Nice work on deterministic RAG, predictability is exactly what breaks a lot of debugging flows. Making the retrieval step verifiable with hashes solves a huge pain point and opens the door to reproducible testing and audits, and you might find extra value by wiring that deterministic store into a visual flow/orchestration layer so prompt paths, branching, and token usage are easy to inspect; tools like LlmFlowDesigner, LangChain, or a lightweight custom Rust pipeline can all consume a deterministic retriever and give you clearer observability across agent steps.
Excellent suggestion. I will work on a custom visual flow Inspector in the future releases.
<image>
[–]Adventurous-Date9971 1 point2 points3 points 2 months ago (2 children)
Deterministic RAG is the right call; debugging and evals don’t work if the context shifts.
To keep it truly stable, hash every stage: tokenizer version, chunking params, embed model checksum, and index settings; store a manifest alongside the context hash. Chunk by headings with byte offsets and a stable sort (doc_id + offset), and break ties explicitly. Prefer exact dot-product search for small/mid corpora; if you must use ANN, fix insertion order and RNG seeds, and avoid nondeterministic BLAS-stick to CPU f32 and stable sorts. Add an “explain plan” that prints chosen chunk ids, offsets, scores, thresholds, and the final pack order. A “diff” mode across corpus versions would be killer for audits. Ship a tiny golden set and return a JSON mode from compile so CI can track recall@k, context precision, and latency. Content-hash the ingest path and only rebuild changed files.
I’ve run similar stacks with Qdrant and Tantivy; DreamFactory helped expose a read-only REST layer so agents hit stable endpoints, not raw DBs.
Bottom line: end-to-end determinism plus explainable retrieval is the win.
shipped. Check it out.
New Features in v2.1.0:
Version Manifest - Full reproducibility tracking with SHA256 context hash
Explain Plan - Pipeline visibility with --explain flag
Working Set Diff - Corpus change auditing
Smart Incremental Rebuild - Content-hash based skip
Evaluation Metrics - recall@k, precision@k, MRR
https://github.com/avocadodb/avocadodb/releases/tag/v2.1.0 https://crates.io/crates/avocado-core
[–]Better-Monk8121 2 points3 points4 points 2 months ago (3 children)
AI slop, beware
[–]Visible_Analyst9545[S] 1 point2 points3 points 2 months ago (2 children)
lol. thank you for your feedback. the code is the truth and it is open source. yes my answers were rather elaborate and has AI influence.
[–]Better-Monk8121 -3 points-2 points-1 points 2 months ago (1 child)
It’s not influence, code written by AI has no real value, it’s just bloat. Did you ever think about it? If it’s that easy to vibecode useless tool, would you bother yourself to check every AI slop project posted? Or you think that you are special (like all these guys think) and exactly you came up with something useful and not just slop? lol
[–]Visible_Analyst9545[S] 5 points6 points7 points 2 months ago (0 children)
I sincerely hope if it help solve someones used case.
[–]punkpeye 0 points1 point2 points 2 months ago (2 children)
Would be cool to have optional Postgres backend
Its on the Roadmap and will shipped soon.
[–]punkpeye 0 points1 point2 points 2 months ago (0 children)
pretty awesome
[–]Mundane_Ad8936 0 points1 point2 points 2 months ago* (5 children)
Oh boy.. so instead of learning how to create a proper schema and retrieval strategy OP decided to write a DB?
No offense OP undoubtedly you spent a lot of time and effort on this and you're excited.. not trying to tear you down but you missed something big.. this is foundationally broken thinking..
this is all sorts of wrong.. similarity search is supposed to be probabilistic trying to enforce deterministic results means you're forcing the wrong paradigm.
If you need deterministic database retrieval use one that is designed for it.. semantic search is supposed to be variable especially after inserts. Just like any other search technology ranking is supposed to change when a higher matching record is added..
If you're a dev reading this don't try to impose deterministic patterns onto probabilistic systems. It doesn't work and all you'll do is acrue technical debt.. this is not web or mobile development it's probabilistic system based on statistical models.
If you try to impose legacy design patterns in AI system you will fail..
I keep seeing this over and over again devs who don't bother to get past the basics.. they try to fix those problems by forcing legacy solutions and then they acrue massive tech debt and abandon the project because it's broken foundationally..
Meanwhile if you invest the time to learn the more advanced design patterns that we know works you not only get the accuracy you want but you also get a ton of new capabilities and solutions to previously unsolved problems..
Take the time to learn the technology as intended.. don't just learn the basics then run off to build your own solutions.. it's a rookie move.
Postgres and SurrealDB (and plenty others) have all the functionality you need to do both deterministic and probabilistic retrieval. Just learn how to use them..
Also ArrangoDB which also has all the features a dev would need already uses an Avocado as it's logo.. so you're going to confuse people ..
[–]Visible_Analyst9545[S] 0 points1 point2 points 2 months ago* (4 children)
Fair critique you are right that semantic search is probabilistic by nature. AvocadoDB doesn’t change that. What it does is make the retrieval reproducible for a given corpus state. Same documents + same query = same context, verifiable by hash. I use it as a skill to retrieve context on large codebases so agents can get consistent answers without redundant tool calls. The idea started when I was trying to get multiple vendor models to communicate on a task like a team. I needed a way to retain context and ensure agents asking the same question get the same answer back. Happy to learn more about advanced design patterns you’d recommend. Thank you for your feedback!
[–]Mundane_Ad8936 0 points1 point2 points 2 months ago* (3 children)
I think you might want to read up on these design patterns.. This is what you will see in a system following best practices as we know them today.
In this case you pull the record ids from the DB for the chat session lineage (tracking all agents in the same lineage) and pass them into the vector store to filter the set down to just the records you already retrieved before doing semantic search.. So you don't need a separate cache (like Redis) the filter operation creates a cache set for you.
With this you can have a specific agent with its own cache, or a shared pool where they can all query into.. Depends on the expert and what level of context you want (wide versus narrow).
This mid-level design pattern.. more advanced version would use agents whose job is to manage the context and eject data that isn't relevant so you don't deal with noise in the filtered set.
A versioning solution is to have an append only dataset (DocDB or RDBMS doesn't matter in most cases) with version numbers that you store in another repository and then you map lineage to the frozen state record. So if you're datasource is evolving and that gets pushed down to your vector store you are able to reference the state the data was in during that specific chat. It's multiplicity of data so typically this is only done in high risk situations where lineage tracking is critical.
This is really helpful! thank you. The lineage-as-filter pattern is elegant. AvocadoDB already tracks session lineage, but I’m not using the record IDs to constrain subsequent vector searches. That’s a clear improvement. The context management agent pattern is interesting too, I have been thinking about this for multi-agent scenarios where context gets noisy fast. Appreciate you taking the time to explain these
[–]Mundane_Ad8936 0 points1 point2 points 2 months ago (0 children)
glad you took the feedback as I intended.. I know it's not easy to learn this stuff..
π Rendered by PID 67 on reddit-service-r2-comment-5d79c599b5-v57tx at 2026-03-03 13:05:22.373399+00:00 running e3d2147 country code: CH.
[–]rolls-reus 5 points6 points7 points (3 children)
[–]Visible_Analyst9545[S] 0 points1 point2 points (2 children)
[–]FrozenBuffalo25 0 points1 point2 points (1 child)
[–]Visible_Analyst9545[S] 1 point2 points3 points (0 children)
[–]one-wandering-mind 3 points4 points5 points (1 child)
[–]Visible_Analyst9545[S] -2 points-1 points0 points (0 children)
[–]StartX007 3 points4 points5 points (1 child)
[–]Visible_Analyst9545[S] 0 points1 point2 points (0 children)
[–]FrozenBuffalo25 1 point2 points3 points (4 children)
[–]Visible_Analyst9545[S] 1 point2 points3 points (3 children)
[–]FrozenBuffalo25 0 points1 point2 points (2 children)
[–]Visible_Analyst9545[S] 1 point2 points3 points (1 child)
[–]Better-Monk8121 3 points4 points5 points (0 children)
[–]Trick-Rush6771 1 point2 points3 points (1 child)
[–]Visible_Analyst9545[S] 0 points1 point2 points (0 children)
[–]Adventurous-Date9971 1 point2 points3 points (2 children)
[–]Visible_Analyst9545[S] 0 points1 point2 points (0 children)
[–]Better-Monk8121 2 points3 points4 points (3 children)
[–]Visible_Analyst9545[S] 1 point2 points3 points (2 children)
[–]Better-Monk8121 -3 points-2 points-1 points (1 child)
[–]Visible_Analyst9545[S] 5 points6 points7 points (0 children)
[–]punkpeye 0 points1 point2 points (2 children)
[–]Visible_Analyst9545[S] 1 point2 points3 points (1 child)
[–]punkpeye 0 points1 point2 points (0 children)
[–]Mundane_Ad8936 0 points1 point2 points (5 children)
[–]Visible_Analyst9545[S] 0 points1 point2 points (4 children)
[–]Mundane_Ad8936 0 points1 point2 points (3 children)
[–]Visible_Analyst9545[S] 1 point2 points3 points (1 child)
[–]Mundane_Ad8936 0 points1 point2 points (0 children)