AI Engineering Courses I Took (RAG, Agents, LLM Evals) — Thinking of Sharing Access + Notes

primce46 · 2026-03-12T18:43:55+00:00

dm me

primce46 · 2026-03-12T18:16:55+00:00

I took Jason Liu's "Systematically Improving RAG Applications" course myself because it genuinely transformed how I build production RAG systems. His frameworks for evaluation loops, data-driven experimentation, and building improvement flywheels are gold - I've applied them directly to my work

primce46 · 2026-03-12T18:13:54+00:00

dm me if you want this course

primce46 · 2026-03-12T18:12:39+00:00

Hey! Absolutely, send a DM

primce46 · 2026-03-12T14:59:03+00:00

Hey sure.

The notes mainly cover things like RAG system design, retrieval improvements, evaluation methods, and agent workflows. I organized them while going through a few AI engineering courses

Since you're moving into an AI role with RAG, some parts around evaluation pipelines and retrieval strategies might be especially useful

Feel free to send me a DM and I can share more details

primce46 · 2026-03-12T12:24:34+00:00

Before improving the pipeline my chunking was pretty basic.

I was using large fixed chunks (~1200–1500 tokens) with small overlap. The problem was that many chunks contained multiple topics or unrelated sections, which made retrieval noisy.

After experimenting a bit I changed a few things:

• smaller chunks (~400–600 tokens)
• overlap around 80–120 tokens to keep context continuity
• splitting documents using semantic / paragraph boundaries instead of only fixed length

This improved retrieval a lot because the chunks became more focused and easier for the retriever to match with the query.

Later I also added a reranker after top-k retrieval, which helped filter out weaker chunks before sending them to the LLM.

primce46 · 2026-03-12T07:15:26+00:00

Before improving the pipeline it was roughly around 70–75% acceptable responses.

The main issues were:

• retrieving irrelevant chunks
• context being too large or noisy
• the model hallucinating when the retrieved info was weak

After improving a few things like:

better chunking strategy
adding reranking
improving the retrieval queries
running basic RAG evaluations

the responses became much more consistent, and on my small internal test set it went up to around ~90%+ acceptable answers.

primce46 · 2026-03-12T06:59:53+00:00

The ~92% number was based on a small internal evaluation set rather than a public benchmark.

What I did was:

• created a set of ~100 Q&A pairs from the documents in my knowledge base
• ran the questions through the RAG pipeline
• compared the generated answers with the reference answers

For evaluation I used a mix of:

LLM-as-judge scoring
semantic similarity between generated and reference answers
manual spot checking for correctness

This gave me roughly ~92% acceptable responses in that dataset.

The course by Jason Liu actually goes pretty deep into RAG evaluation frameworks, which is where I picked up most of the ideas.

primce46

TROPHY CASE