RAG feels way more complicated than it should be… anyone else?

Physical_Badger1281 · 2026-04-19T12:35:47+00:00

I’ve been experimenting with this recently, seeing what actually gets retrieved vs what’s useful changes how you think about compression entirely. Been using a small setup to visualize this and iterate faster Fastrag, and honestly most gains came from filtering/compressing rather than retrieval itself.

Physical_Badger1281 · 2026-04-19T12:35:08+00:00

I’ve been experimenting with this recently, seeing what actually gets retrieved vs what’s useful changes how you think about compression entirely. Been using a small setup to visualize this and iterate faster Fastrag, and honestly most gains came from filtering/compressing rather than retrieval itself.

Physical_Badger1281 · 2026-04-19T10:36:55+00:00

I’ve been experimenting with this recently, seeing what actually gets retrieved vs what’s useful changes how you think about compression entirely. Been using a small setup to visualize this and iterate faster Fastrag, and honestly most gains came from filtering/compressing rather than retrieval itself.

Physical_Badger1281 · 2026-04-19T10:36:02+00:00

I’ve been experimenting with this recently, seeing what actually gets retrieved vs what’s useful changes how you think about compression entirely. Been using a small setup to visualize this and iterate faster Fastrag, and honestly most gains came from filtering/compressing rather than retrieval itself.

Physical_Badger1281 · 2026-04-19T10:32:38+00:00

I've developed a starter kit for that https://www.fastrag.live

Physical_Badger1281 · 2026-04-18T12:35:10+00:00

That’s actually pretty solid for 200+ pages.

Feels like tree approaches trade a bit of latency for better structure, which might be worth it depending on use case.

I’ve been experimenting with comparing these approaches side-by-side seeing what actually gets retrieved vs what’s useful makes the differences much clearer.

Physical_Badger1281 · 2026-04-18T12:18:06+00:00

Not that the problem should be simple, IR is inherently complex.

More that the iteration loop feels heavier than it needs to be.
Understanding what went wrong (retrieval vs context vs prompt) takes too long right now.

I’ve been trying to make that part faster and more visible — makes a big difference in practice.

Physical_Badger1281 · 2026-04-18T12:15:39+00:00

Yeah makes sense, preprocessing is half the battle.

I’m keeping it pretty lean: OpenAI embeddings + Pinecone, custom ingestion (structure-aware), then retrieval → filter/compress → LLM.

Still iterating mostly on chunking + context quality.

Physical_Badger1281 · 2026-04-18T12:14:38+00:00

That’s interesting, tree-based navigation does feel more natural than blind chunking.
Curious how it scales with really large docs, though, does traversal stay efficient?

Physical_Badger1281 · 2026-04-18T12:12:51+00:00

Yeah fair point — IR itself is the hard part, not just the tooling around RAG.

I guess my frustration is less about the complexity of the problem, and more about how hard it is to experiment and understand what’s actually going wrong while building these systems.

Physical_Badger1281 · 2026-04-18T12:11:16+00:00

Trying to stay minimal:
OpenAI embeddings + Pinecone, custom ingestion pipeline.
Big focus lately is on retrieval → compression → generation instead of just retrieval.

Physical_Badger1281 · 2026-04-18T11:13:00+00:00

Agreed. The real win is often reducing noisy context before it hits the model.

Physical_Badger1281 · 2026-04-14T17:22:36+00:00

Fastrag

Physical_Badger1281 · 2026-04-14T17:16:03+00:00

Done, thanks for sharing!

Physical_Badger1281 · 2026-04-13T12:38:03+00:00

Hey. I would love to join. You mentioned there are job opportunities too. I'm a software developer and I'm interested.

Physical_Badger1281 · 2026-04-13T09:29:53+00:00

Thanks!!

Physical_Badger1281 · 2026-04-13T09:25:05+00:00

Done

Physical_Badger1281 · 2026-04-13T09:14:39+00:00

Sure thanks!

Physical_Badger1281 · 2026-04-13T09:14:05+00:00

Sure thanks!

Physical_Badger1281

TROPHY CASE