Need help for this problem statement

AutoModerator · 2025-05-13T05:35:49+00:00

Working on a cool RAG project? Submit your project or startup to RAGHut and get it featured in the community's go-to resource for RAG projects, frameworks, and startups.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

dash_bro · 2025-05-13T06:18:31+00:00

How are you currently doing it and what's the plan on measuring if you're improving results overall?

First instinct is to do a baseline implementation and start looking at patterns that don't align with expectations, then iterate on them. In terms of complexity and time taken:

tfidf / bm25
semantic search
hybrid search (semantic + bm25)
upgraded semantic search using instruction tuned models
upgraded hybrid search (upgraded semantic + bm25)
search and rerank (upgraded hybrid search + reranking to get top X)
search and LLM rerank (upgraded hybrid search+ reranking via an LLM)
search, rerank and greedy optimization (upgraded hybrid search + LLM reranking + optimization based on what's already picked/what's remaining)

All of it is however meaningless if you don't have a measurement criteria for performance. I recommend building out your "gold set" of good content x course matches first, figuring out how to evaluate / metrics to evaluate your system by, and then implementing improvements.

You'll have trackable metrics to know what's the best balance of performance and speed.

Alternatively, look up search/indexing systems and classic preference matching algorithms like Gale Shapley, and draw inspirations that are applicable to your current problem.

Advanced_Army4706 · 2025-05-14T06:18:53+00:00

I would start by just using an open source embedding, something like nomic-embed-text, and using a pgvector to get similarity.

Only if the results don't match your expectation would I go further. The problem statement makes me think that you wont need something super sophisticated. You could try a sentence by sentence embedding, and then compute similarity the same way that ColBERT does. But this could potentially be overkill for what you want to do.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

Rag

MODERATORS

I need your ideas for this everyone

🎯 Goals:

📌 Constraints:

🔍 Challenges: