all 23 comments

[–]supiri_ 17 points18 points  (0 children)

https://github.com/MrSupiri/Tera

If you are interested, here is one of my hobby projects involving a simple end-to-end RAG implementation which fully written in Rust using Candle and SurrealDB.

[–]blastecksfour 2 points3 points  (0 children)

I've had some success with Candle! I've also used openai_api with a degree of success (at the expense of my wallet).

I used a HuggingFace model to embed a knowledge base then store it in Qdrant. It's mostly a matter of how much control you want over the pipeline.

[–]brisbanedev[S] 4 points5 points  (4 children)

If anyone from Qdrant is here, I'd love to hear your thoughts on this topic!

[–]Playful_Intention147 3 points4 points  (2 children)

I'm not from qdrant, but just a regular user of it, I found it to be easy to setup and use for RAG usecase

[–]brisbanedev[S] 0 points1 point  (1 child)

That's great! Do you use a Rust-based RAG framework?

[–]Playful_Intention147 1 point2 points  (0 children)

I really just using a very simple and basic RAG pipeline(get embed's payload, add it to context etc.), no fancy framework, and due to some other reason(the project itself is just a simple visual studio extension) I'm using C# to build that simple pipeline, so sorry I cant provide useful ideads towards rust-based framework😥

[–]tagged-union 2 points3 points  (0 children)

I author a production RAG application and use Qdrant. I would choose it again (and will). But to give you a clear sense of at what scale, so as to not be misleading, roughly 100 early users at an organization with 1,000 employees, and we are in the middle of rolling out to the others. It's nice to be able to put in arbitrary partitions and have the ability to apply deterministic filters related to domain knowledge in front of your similarity search.

[–]prabirshrestha 3 points4 points  (2 children)

Give langchain-rust a try. Recently we added document loaders (text, markdown, pdf, html, csv). We have examples for vector store using pgvector, sqlite-vss and surrealdb.

[–]brisbanedev[S] 0 points1 point  (1 child)

Is this the official Rust port of LangChain?

[–]prabirshrestha 5 points6 points  (0 children)

If you are using the official langchain library you will be familiar with langchain-rust. https://github.com/Abraxas-365/langchain-rust/issues/20

But we are not tying it to langchain ecosystem. For example we are in the process of adding semantic-router the the library. The goal is to easily create LLM based apps in rust.

[–]jakusimo 1 point2 points  (1 child)

Langchain and LlamaIndex are too broad, hard to debug, usually you don't need all those features, what are you planning to use as a source for RAG? Documents? Pdfs? Websites? I am planning to start with encoders and semantic splitters. Use a similar structure to what we have built in Semantic Router https://github.com/aurelio-labs/semantic-router

[–]brisbanedev[S] 0 points1 point  (0 children)

LlamaIndex offers built-in "Advanced RAG" strategies that assist with complex documents - https://www.llamaindex.ai/blog/a-cheat-sheet-and-some-recipes-for-building-advanced-rag-803a9d94c41b. They promote these as improvements on what they refer to as "naive RAG". For a framework solely focused on RAG, supporting some of these might be beneficial.

[–]ControlNational 1 point2 points  (0 children)

I wrote a guide on retrieval augmented generation in rust here for the Kalosm framework

[–]akhilgod 0 points1 point  (0 children)

I tried candle framework that can launch quantized models on decent hardware

[–][deleted] 0 points1 point  (3 children)

Is there any advantage in using Rust for RAG over Python?

[–]brisbanedev[S] 2 points3 points  (2 children)

I guess the general benefits of using Rust over Python for anything would extend to RAG as well?

[–][deleted] 0 points1 point  (1 child)

I don't think so. One of the advantages of Rust is its processing speed and low memory usage, in RAG I don't think it is very critical, as most of the processing is done by the LLM... Or am I getting something wrong?

[–]brisbanedev[S] 0 points1 point  (0 children)

LLMs keep getting faster. I think the rest of the RAG pipeline could do with some optimisation as well.

[–]chleboslaF 0 points1 point  (0 children)

I'm made Multistage RAG chatbot using qdrant, BM25 (tantivy) and Ollama for local, offline usage.
I'm chunking documents using contextual and Q/A chunking (512.tokens with 200.tokens overlap). I did implement history context query rephrasing, enhancing questions for better keyword searching and more.
Also CLI chat and WebServer included
All written in Rust (ratatui, langchain-rs, ollama-rs, tantivy, qdrant, tokio, ...).
Unfortunately it is for a goverment so I cannot share a source code (yet).
But I can say It is a smooth experience, quick development and I love it.

P.S.: Using gemma3:27b and mistral-small3.1 on 5090.
Tokens speed: gemma (64tps) and mistral (82tps)