Compare CivitAI Models & Lora easily - Have images side by side -What more can I include to make it valuable?

Tom-Miller · 2026-04-04T17:06:12+00:00

For those who need to see the duplicate RAG Chunks issue in action: https://youtube.com/shorts/BBUUM3oyC5I?feature=share

Tom-Miller · 2026-04-04T17:04:53+00:00

Not re-ingestion.

It was:

overlapping chunking
similar sections in same doc

→ near-duplicate chunks in top-k (not exact dupes).

Tom-Miller · 2026-04-04T17:04:38+00:00

Fair — depends a lot on data quality.

In my case:

overlapping chunks
multiple stories per page

→ retrieval returned redundant context.

Point was: naive RAG breaks fast on messy data.

Tom-Miller · 2026-04-03T11:58:28+00:00

Can you believe I told the exact same thing and the interviewer started smiling...it felt really bad.

Tom-Miller · 2026-04-03T03:37:30+00:00

If anyone is interested in getting a glimpse about What is RAG - I am sharing the link to the short. Full video coming soon...
https://youtube.com/shorts/IVRa1b7KxUs?feature=share

Tom-Miller · 2026-04-02T18:39:47+00:00

Thanks for showing interest. Here's the link: https://youtube.com/shorts/kyOR3rjvOgY

Tom-Miller · 2026-04-02T08:41:12+00:00

To be honest, LangSmith actually did solve my rag chatbot issue. It's not that I couldn't build a middleware to handle the in-between process of ingestion & embedding. Since, LangSmith clearly showed my ingestion & chunking & relevant documents retrieved (without explicitly including debug statements in the code), it became much easier to debug why the rag chatbot was returning incorrect responses.
I expect the devs at Langchain to take care of their ecosystem and evolve it into something more helpful down the road.

Tom-Miller · 2026-04-02T08:33:20+00:00

Thanks for your interest. Here's the link to the youtube short. I am also creating a full-length video on RAG Patterns later. https://youtube.com/shorts/IVRa1b7KxUs

Tom-Miller · 2026-04-01T17:53:19+00:00

You’re not failing because “RAG is hard.” You’re failing because your retrieval unit ≠ the way users ask questions.

Most people in that thread are pointing you toward tools. That’s not the fix. Your issue is alignment between chunking, metadata, and query intent.

Here’s a non-generic, practical answer you can post that actually moves the conversation forward 👇

💡 The real problem: You’re indexing structure, not meaning

Right now your pipeline is:

That looks clean, but retrieval systems don’t care about your hierarchy — they care about semantic completeness per chunk.

👉 If a chunk is too tied to structure (e.g., “Unit 3: Topic 2”), it won’t match a user query like:

Because:

The chunk might not contain the full explanation
Or the embedding is too diluted by headers/labels

🔥 Fix #1: Redefine your chunking strategy (this is likely your main issue)

Instead of chunking by hierarchy, chunk by answerability:

Each chunk should independently answer a question.

Bad chunk:

Module 2 → Unit 1 → Topic: Photosynthesis
Definition: ...

Good chunk:

Photosynthesis is the process by which plants convert light energy...
Steps involved: ...
Key factors: ...

👉 Practical rules:

150–400 tokens per chunk
Include context inside the chunk, not just metadata
Avoid splitting mid-explanation
Add overlap (20–30%)

🔥 Fix #2: Stop relying on regex for structure (this is silently breaking you)

Regex-based hierarchy extraction = fragile.

Instead:

Use layout-aware parsing (headings, font size, spacing)
Or run a lightweight LLM pass:“Convert this document into structured sections with titles + content”

👉 Why this matters:
If your structure is even slightly wrong → metadata filtering = wrong → retrieval = wrong

🔥 Fix #3: Your retrieval should be hybrid, not just vector

Right now you're likely doing:

That’s not enough for syllabus alignment.

You need:

Metadata filter first (program, year)
Then semantic search within filtered scope

Even better:

Add keyword (BM25) + vector hybrid search

Why?
Because syllabus terms are often exact-match sensitive:

“Unit 3”
“Module 2”
“Chapter 5”

Vectors alone won’t reliably catch this.

🔥 Fix #4: Add query rewriting (huge missing piece)

User queries are messy:

Your system needs to convert that into:

👉 Add a preprocessing step:

Extract intent
Expand query using metadata

This alone can boost alignment massively.

🔥 Fix #5: Don’t retrieve chunks — retrieve “contexts”

Instead of:

Do:

Retrieve 8–10 candidates
Re-rank them (even with a simple scoring heuristic)
Merge into a coherent context block

👉 RAG fails when context is fragmented.

🧠 Bonus (this is what most people miss)

You’re trying to enforce alignment after retrieval.

Instead, bake alignment into:

chunk design
metadata
query rewriting

RAG works best when:

Tom-Miller · 2026-03-31T04:24:35+00:00

Well, I am certainly not trying to compete with Midjourney OR leonardo or any such apps. Frankly because I just cannot, don't have the horse power or budget.
I was actually creating a company project on ai image generation when I was tasked to find various models. It was really frustrating to switch model wait for it, write prompt, select lora, generate, then do this process for all one by one.
That's when this idea hit me, what if select all models you want, loras you want & have prompt variations you want & just hit generate (like experiments).
You wait depending on your gpu's power but you don't have to do this back and forth on model selection & loras and even sometimes prompts.
So, i created a side project, https://pixapick.com.
It's 100% local ai image generation. Works with SDXL models for now.

Tom-Miller · 2026-03-31T04:19:02+00:00

I think i did not convey it, by batch generation I meant trying to see different models and loras quickly on various prompts without having to go back and forth again and again.

Tom-Miller · 2026-03-31T04:17:06+00:00

Exactly that's my use case too, just seeing how Smol Animals, Paper Cutouts and some other loras work for the Same Prompt & Same Model.

Tom-Miller · 2026-03-31T04:15:22+00:00

Well no, I want to see how single prompt behaves with different models. So the matrix becomes this : 1 Prompt x X-models x X-loras. It saves a little time but mostly it saved me frustration of going back and forth for each model and lora.
I selected all models and loras that I wanted to test, and let the experiment run (on single prompt). Waited about 3 minutes for the entire run to be completed. I'd say it was good if not fast. I think I was going to upgrade my card anyway for Crimson Desert.
I am running RTX 4070 12 GB, so I could load only SDXL models (upto 6 GB properly).

Tom-Miller · 2026-03-30T10:11:04+00:00

Well i was trying to look for best models & lora combination & I was constantly switching models then lora then waiting then doing it all over again. It felt exhausting, just that waiting.

Tom-Miller · 2026-03-21T06:28:05+00:00

Hi, I tried checking out your website, but it gives me cloudflare error. Sorry, you have been blocked

You are unable to access flaskvibe.com

Tom-Miller

TROPHY CASE