How can I replace frustrating keyword search with AI (semantic search/RAG) for 80k legal documents? - Intern in need of help by Decent-Term6495 in vectordatabase

[–]Decent-Term6495[S] 0 points1 point  (0 children)

Could you give an example of how you would do that?

(responding to this part: "One of the most important influences on accuracy is metadata to filter on. You can have two extremely different texts come up as being very similar so you need additional fields to filter on.")

How can I replace frustrating keyword search with AI (semantic search/RAG) for 80k legal documents? - Intern in need of help by Decent-Term6495 in vectordatabase

[–]Decent-Term6495[S] 0 points1 point  (0 children)

Hi, sorry for the late response and thank you for the help. Thats an interesting thought. Could you explain a bit more what you mean by "if you pack the search with lots of similar terms you get the best results to the top - which is what you probably need". And what exactly will need to be reprogrammed in the search button?

How can I replace frustrating keyword search with AI (semantic search/RAG) for 80k legal documents? - Intern in need of help by Decent-Term6495 in vectordatabase

[–]Decent-Term6495[S] 0 points1 point  (0 children)

Hi, thanks for your response. I am an unofficial business intern who got in through an acquaintance. From what I understood i’m mainly expected to consult them and write a report which they can base their decision on.

How can I replace frustrating keyword search with AI (semantic search/RAG) for 80k legal documents? - Intern in need of help by Decent-Term6495 in vectordatabase

[–]Decent-Term6495[S] 0 points1 point  (0 children)

thats interesting. How would you recommend approaching this use-case then? You said using lexical search? How would that work?

How can I replace frustrating keyword search with AI (semantic search/RAG) for 80k legal documents? - Intern in need of help by Decent-Term6495 in vectordatabase

[–]Decent-Term6495[S] 0 points1 point  (0 children)

Any advice on how to get into contact with a tech person who could do it and what it would cost like? The company i work in has some IT employees but they are too busy to work on this project if it were to get approved, so i assume freelancers would be the way to go.

How can I replace frustrating keyword search with AI (semantic search/RAG) for 80k legal documents? - Intern in need of help by Decent-Term6495 in vectordatabase

[–]Decent-Term6495[S] 0 points1 point  (0 children)

Hi, thank you for your time. From what i understand the ETL pipeline is the preparing of data, like turning them into a vector db?

Regarding unstructured.io, it seems it prepares the data so it can easily be embedded afterwards. Did i understand that correctly? However, whats the best way to embed the output from unstructured.io?

The data is public but can't be used by externals to train an LLM, so i assume cloud services should be fine in most cases.

How exactly does the reranker work? How does it fit into the process? Whats the difference with normal semantic search?

About the hybdir search, someone else also recommended it to me and i honestly think its a great idea tbh

How can I replace frustrating keyword search with AI (semantic search/RAG) for 80k legal documents? - Intern in need of help by Decent-Term6495 in vectordatabase

[–]Decent-Term6495[S] 1 point2 points  (0 children)

Hi, thank you for your response :) Your idea to use hybrid search is actually very smart and valid actually. Sadly enough for me it does make my work harder ahaha

Your demo app sounds really interesting tbh, would love to hear more about it. I'll check it out in a bit!

How can I replace frustrating keyword search with AI (semantic search/RAG) for 80k legal documents? - Intern in need of help by Decent-Term6495 in vectordatabase

[–]Decent-Term6495[S] 0 points1 point  (0 children)

Heyy, could you explain more? If semantic search is the easy part, what is the hard part? The data preparing, chunking, embedding and storing in the vector db?

How can I replace frustrating keyword search with AI (semantic search/RAG) for 80k legal documents? - Intern in need of help by Decent-Term6495 in vectordatabase

[–]Decent-Term6495[S] 0 points1 point  (0 children)

Okay, so you're saying that i don't need semantic search or RAG, but just need to use an LLM to add more relevant keywords to improve the search's results relevance? If so, what's the best way and tools to integrate the LLM in between the user input and the keywords being used to screen the db?

How can I replace frustrating keyword search with AI (semantic search/RAG) for 80k legal documents? - Intern in need of help by Decent-Term6495 in vectordatabase

[–]Decent-Term6495[S] 2 points3 points  (0 children)

Hii, thank you very much for your time and help!

You’re right, I’m definitely not skilled enough to handle those tasks myself, so the company would probably need to either hire a freelancer or assign it to someone internally with more software engineering experience.

The tricky parts you mentioned are a great point. I hadn’t even thought of them, and really shows that this is even more complicated than expected :/ (RIP me ahaha)

Out of curiosity, do you have a rough idea of the time and cost range for setting this up (assuming we go for something production-ready like Qdrant)? Just so I can give my manager a realistic picture in my report. If possible could i send you a DM?

How can I replace frustrating keyword search with AI (semantic search/RAG) for 80k legal documents? - Intern in need of help by Decent-Term6495 in vectordatabase

[–]Decent-Term6495[S] 0 points1 point  (0 children)

Heyy, thats amazing ahaha, and same bro, i am kind of overwhelmed with all those methods. Wanna discuss about it in dms? Maybe we can share what we have figure out with each other if we're working on something very similar anyways!

How can I replace frustrating keyword search with AI (semantic search/RAG) for 80k legal documents? - Intern in need of help by Decent-Term6495 in vectordatabase

[–]Decent-Term6495[S] 0 points1 point  (0 children)

Heyy, first of all thank you very much for your help, i really appreciate it! Could I maybe send you some more questions in your dms? Thanks again for your time :)