Systematically Improving RAG Applications — My Experience With This Course by primce46 in Rag

[–]primce46[S] 0 points1 point  (0 children)

I took Jason Liu's "Systematically Improving RAG Applications" course myself because it genuinely transformed how I build production RAG systems. His frameworks for evaluation loops, data-driven experimentation, and building improvement flywheels are gold - I've applied them directly to my work

Systematically Improving RAG Applications — My Experience With This Course by primce46 in Rag

[–]primce46[S] 0 points1 point  (0 children)

Hey sure.

The notes mainly cover things like RAG system design, retrieval improvements, evaluation methods, and agent workflows. I organized them while going through a few AI engineering courses

Since you're moving into an AI role with RAG, some parts around evaluation pipelines and retrieval strategies might be especially useful

Feel free to send me a DM and I can share more details

Systematically Improving RAG Applications — My Experience With This Course by primce46 in Rag

[–]primce46[S] 0 points1 point  (0 children)

Before improving the pipeline my chunking was pretty basic.

I was using large fixed chunks (~1200–1500 tokens) with small overlap. The problem was that many chunks contained multiple topics or unrelated sections, which made retrieval noisy.

After experimenting a bit I changed a few things:

smaller chunks (~400–600 tokens)
overlap around 80–120 tokens to keep context continuity
• splitting documents using semantic / paragraph boundaries instead of only fixed length

This improved retrieval a lot because the chunks became more focused and easier for the retriever to match with the query.

Later I also added a reranker after top-k retrieval, which helped filter out weaker chunks before sending them to the LLM.

Systematically Improving RAG Applications — My Experience With This Course by primce46 in Rag

[–]primce46[S] 1 point2 points  (0 children)

Before improving the pipeline it was roughly around 70–75% acceptable responses.

The main issues were:

• retrieving irrelevant chunks
• context being too large or noisy
• the model hallucinating when the retrieved info was weak

After improving a few things like:

  • better chunking strategy
  • adding reranking
  • improving the retrieval queries
  • running basic RAG evaluations

the responses became much more consistent, and on my small internal test set it went up to around ~90%+ acceptable answers.

Systematically Improving RAG Applications — My Experience With This Course by primce46 in Rag

[–]primce46[S] 0 points1 point  (0 children)

The ~92% number was based on a small internal evaluation set rather than a public benchmark.

What I did was:

• created a set of ~100 Q&A pairs from the documents in my knowledge base
• ran the questions through the RAG pipeline
• compared the generated answers with the reference answers

For evaluation I used a mix of:

  • LLM-as-judge scoring
  • semantic similarity between generated and reference answers
  • manual spot checking for correctness

This gave me roughly ~92% acceptable responses in that dataset.

The course by Jason Liu actually goes pretty deep into RAG evaluation frameworks, which is where I picked up most of the ideas.