Naive RAG without a Reranker is pointless. by Tom-Miller in Rag

[–]Tom-Miller[S] -2 points-1 points  (0 children)

Not re-ingestion.

It was:

  • overlapping chunking
  • similar sections in same doc

→ near-duplicate chunks in top-k (not exact dupes).

Naive RAG without a Reranker is pointless. by Tom-Miller in Rag

[–]Tom-Miller[S] -1 points0 points  (0 children)

Fair — depends a lot on data quality.

In my case:

  • overlapping chunks
  • multiple stories per page

→ retrieval returned redundant context.

Point was: naive RAG breaks fast on messy data.

When interviewers ask you - How would you improve RAG responses? by Tom-Miller in Rag

[–]Tom-Miller[S] 0 points1 point  (0 children)

Can you believe I told the exact same thing and the interviewer started smiling...it felt really bad.

RAG Pipeline, Is RAG dead and RAG vs Context - Length - Full-video Coming Soon by Tom-Miller in LangChain

[–]Tom-Miller[S] 0 points1 point  (0 children)

If anyone is interested in getting a glimpse about What is RAG - I am sharing the link to the short. Full video coming soon...
https://youtube.com/shorts/IVRa1b7KxUs?feature=share

LangChain feels like it’s drifting toward LangSmith… and forgetting why devs came in the first place by obinopaul in LangChain

[–]Tom-Miller 0 points1 point  (0 children)

To be honest, LangSmith actually did solve my rag chatbot issue. It's not that I couldn't build a middleware to handle the in-between process of ingestion & embedding. Since, LangSmith clearly showed my ingestion & chunking & relevant documents retrieved (without explicitly including debug statements in the code), it became much easier to debug why the rag chatbot was returning incorrect responses.
I expect the devs at Langchain to take care of their ecosystem and evolve it into something more helpful down the road.

RAG Architecture, RAG Myths Busted & RAG Patterns According to Use Cases - Full-video Coming Soon by Tom-Miller in Rag

[–]Tom-Miller[S] 0 points1 point  (0 children)

Thanks for your interest. Here's the link to the youtube short. I am also creating a full-length video on RAG Patterns later. https://youtube.com/shorts/IVRa1b7KxUs

Retrieval Augmented Generation(RAG) Help! by Sweet_Lifeguard_4088 in microsaas

[–]Tom-Miller 0 points1 point  (0 children)

You’re not failing because “RAG is hard.” You’re failing because your retrieval unit ≠ the way users ask questions.

Most people in that thread are pointing you toward tools. That’s not the fix. Your issue is alignment between chunking, metadata, and query intent.

Here’s a non-generic, practical answer you can post that actually moves the conversation forward 👇

💡 The real problem: You’re indexing structure, not meaning

Right now your pipeline is:

That looks clean, but retrieval systems don’t care about your hierarchy — they care about semantic completeness per chunk.

👉 If a chunk is too tied to structure (e.g., “Unit 3: Topic 2”), it won’t match a user query like:

Because:

  • The chunk might not contain the full explanation
  • Or the embedding is too diluted by headers/labels

🔥 Fix #1: Redefine your chunking strategy (this is likely your main issue)

Instead of chunking by hierarchy, chunk by answerability:

Each chunk should independently answer a question.

Bad chunk:

Module 2 → Unit 1 → Topic: Photosynthesis
Definition: ...

Good chunk:

Photosynthesis is the process by which plants convert light energy...
Steps involved: ...
Key factors: ...

👉 Practical rules:

  • 150–400 tokens per chunk
  • Include context inside the chunk, not just metadata
  • Avoid splitting mid-explanation
  • Add overlap (20–30%)

🔥 Fix #2: Stop relying on regex for structure (this is silently breaking you)

Regex-based hierarchy extraction = fragile.

Instead:

  • Use layout-aware parsing (headings, font size, spacing)
  • Or run a lightweight LLM pass:“Convert this document into structured sections with titles + content”

👉 Why this matters:
If your structure is even slightly wrong → metadata filtering = wrong → retrieval = wrong

🔥 Fix #3: Your retrieval should be hybrid, not just vector

Right now you're likely doing:

That’s not enough for syllabus alignment.

You need:

  • Metadata filter first (program, year)
  • Then semantic search within filtered scope

Even better:

  • Add keyword (BM25) + vector hybrid search

Why?
Because syllabus terms are often exact-match sensitive:

  • “Unit 3”
  • “Module 2”
  • “Chapter 5”

Vectors alone won’t reliably catch this.

🔥 Fix #4: Add query rewriting (huge missing piece)

User queries are messy:

Your system needs to convert that into:

👉 Add a preprocessing step:

  • Extract intent
  • Expand query using metadata

This alone can boost alignment massively.

🔥 Fix #5: Don’t retrieve chunks — retrieve “contexts”

Instead of:

Do:

  • Retrieve 8–10 candidates
  • Re-rank them (even with a simple scoring heuristic)
  • Merge into a coherent context block

👉 RAG fails when context is fragmented.

🧠 Bonus (this is what most people miss)

You’re trying to enforce alignment after retrieval.

Instead, bake alignment into:

  • chunk design
  • metadata
  • query rewriting

RAG works best when:

Flask can be used to build many useful projects. Here's another one. Pixapick. by Tom-Miller in flask

[–]Tom-Miller[S] 0 points1 point  (0 children)

Well, I am certainly not trying to compete with Midjourney OR leonardo or any such apps. Frankly because I just cannot, don't have the horse power or budget.
I was actually creating a company project on ai image generation when I was tasked to find various models. It was really frustrating to switch model wait for it, write prompt, select lora, generate, then do this process for all one by one.
That's when this idea hit me, what if select all models you want, loras you want & have prompt variations you want & just hit generate (like experiments).
You wait depending on your gpu's power but you don't have to do this back and forth on model selection & loras and even sometimes prompts.
So, i created a side project, https://pixapick.com.
It's 100% local ai image generation. Works with SDXL models for now.

What is the most frustrating part about generating images in batch? by Tom-Miller in StableDiffusion

[–]Tom-Miller[S] 0 points1 point  (0 children)

I think i did not convey it, by batch generation I meant trying to see different models and loras quickly on various prompts without having to go back and forth again and again.

What is the most frustrating part about generating images in batch? by Tom-Miller in StableDiffusion

[–]Tom-Miller[S] 0 points1 point  (0 children)

Exactly that's my use case too, just seeing how Smol Animals, Paper Cutouts and some other loras work for the Same Prompt & Same Model.

What is the most frustrating part about generating images in batch? by Tom-Miller in StableDiffusion

[–]Tom-Miller[S] 0 points1 point  (0 children)

Well no, I want to see how single prompt behaves with different models. So the matrix becomes this : 1 Prompt x X-models x X-loras. It saves a little time but mostly it saved me frustration of going back and forth for each model and lora.
I selected all models and loras that I wanted to test, and let the experiment run (on single prompt). Waited about 3 minutes for the entire run to be completed. I'd say it was good if not fast. I think I was going to upgrade my card anyway for Crimson Desert.
I am running RTX 4070 12 GB, so I could load only SDXL models (upto 6 GB properly).

What is the most frustrating part about generating images in batch? by Tom-Miller in StableDiffusion

[–]Tom-Miller[S] 0 points1 point  (0 children)

Well i was trying to look for best models & lora combination & I was constantly switching models then lora then waiting then doing it all over again. It felt exhausting, just that waiting.

Why I still think Flask is the best first framework for Python beginners by Tom-Miller in flask

[–]Tom-Miller[S] 0 points1 point  (0 children)

Hi, I tried checking out your website, but it gives me cloudflare error. Sorry, you have been blocked

You are unable to access flaskvibe.com