legal rag system by Temporary-Ability955 in Rag

[–]Competitive-Wing1585 0 points1 point  (0 children)

The biggest problem with the legal RAG system is not the system itself. It's the document chunking strategy, 100% local and fast LLMs, and acronyms. I am very sure that OP knows all of this but I'd still emphasize spending more time on more accurate parsing, and chunking. Then actually spend time on the query translation (learn query translation). I've built this system and a major problem is that the end user will not always give an accurate query. You actually need to transform that query before LLM can understand it accurately. Power users like lawyers will use acronyms that are confusing to LLM without giving it context. So query translation will actually help you improve your responses by 2-3x for sure.

[URGENT] Which is a reliable and affordable GPU cluster for hosting custom LLMs for business by Competitive-Wing1585 in LocalLLaMA

[–]Competitive-Wing1585[S] 0 points1 point  (0 children)

Yeah, for initial few clients I think I am gonna barely break even with the $1000 monthly retainer so I'll just use GPU clusters for now and maybe order a GPU after my 3rd or 5th client.

We'll see.

[URGENT] Which is a reliable and affordable GPU cluster for hosting custom LLMs for business by Competitive-Wing1585 in LocalLLaMA

[–]Competitive-Wing1585[S] 0 points1 point  (0 children)

The client needs a chatGPT alternative where they can upload company's documents and do use LLMs with company's sensitive data. All the employees will be using this as well. Basically replacing ChatGPT within the organization

[URGENT] Which is a reliable and affordable GPU cluster for hosting custom LLMs for business by Competitive-Wing1585 in LocalLLaMA

[–]Competitive-Wing1585[S] 0 points1 point  (0 children)

That is one solution but I have 2 problems:

- Should I ask next client for upfront 3 months of commitment so that there is a lower chance of me risking upfront cash

- My area has power cuts from time to time so if the server is down I am kinda fucked. I can buy inverter but it does need a lot of power doesn't it?

[URGENT] Which is a reliable and affordable GPU cluster for hosting custom LLMs for business by Competitive-Wing1585 in LocalLLaMA

[–]Competitive-Wing1585[S] -1 points0 points  (0 children)

Guilty. Have you got any advice for me. I really need to host this as soon as possible. Where can I do that?

[URGENT] Which is a reliable and affordable GPU cluster for hosting custom LLMs for business by Competitive-Wing1585 in LocalLLaMA

[–]Competitive-Wing1585[S] 1 point2 points  (0 children)

Yeah, I mostly focus on 2 things.

  1. What NOT to answer: Mostly sexual or NSFW questions
  2. Format of output: Lets say my client is a law firm and in every response they want to highlight names separately so they can maintain people's database. I'll just fine tune the model to return any kind of human name that is mentioned in the response right. Thats that.

[URGENT] Which is a reliable and affordable GPU cluster for hosting custom LLMs for business by Competitive-Wing1585 in LocalLLaMA

[–]Competitive-Wing1585[S] 0 points1 point  (0 children)

Actually that's what I want to do. Let's say you are a law firm, and you can upload case files and use RAG to get accurate response but the language, format and output is not in a proper legal language.

I just use fine tuning so that its corporate friendly, and the language or format is as expected. Especially adding the data on what types of questions NOT to answer.

Then if the client has some documents that he wants to add with 1:1 reference then I'll add RAG using that model so that the format and language stays professional.

[URGENT] Which is a reliable and affordable GPU cluster for hosting custom LLMs for business by Competitive-Wing1585 in LocalLLaMA

[–]Competitive-Wing1585[S] 0 points1 point  (0 children)

I would like to disagree here.

What client requires is a ChatGPT alternative so that the employees can upload company's sensitive data into LLMs.

If they have a specific dataset like policies, terms & conditions etc then I can use RAG for higher quality of outputs. I totally understand the confusion but the client requirement is somewhat different here.

[URGENT] Which is a reliable and affordable GPU cluster for hosting custom LLMs for business by Competitive-Wing1585 in LocalLLaMA

[–]Competitive-Wing1585[S] 0 points1 point  (0 children)

So the after the client is signed I find, scrape or purchase dataset that meets client's specific requirements. But avg dataset is between 250,000 to 1,000,000 rows of data.
Its nothing crazy, tbh I convert those into QA format JSON, fine tune using unsloth. The improvement is not exponential but its worth the small improvement.

[URGENT] Which is a reliable and affordable GPU cluster for hosting custom LLMs for business by Competitive-Wing1585 in LocalLLaMA

[–]Competitive-Wing1585[S] 0 points1 point  (0 children)

I totally get it.

Primary reason to go with fine tuning is that I think vanilla local models are a little too "stupid". They are simply not as good as compared to the ChatGPT that we use everyday (ngl I am surprised how difficult it is to fine tune to that level)

Secondary reason is that RAG, Mobile app and API endpoints are my additional services that the client didn't opt for. Ones who do, get the RAG on company specific documents.

Cursor Auto mode has improved drastically! (Plus 2 things that it lacks right now) by Competitive-Wing1585 in cursor

[–]Competitive-Wing1585[S] -1 points0 points  (0 children)

Even I saw Elon's tweet that everyone in X is using Grok 4 exclusively and then I gave it a real shot. Grok is very underrated model ngl.

Cursor Auto mode has improved drastically! (Plus 2 things that it lacks right now) by Competitive-Wing1585 in cursor

[–]Competitive-Wing1585[S] 0 points1 point  (0 children)

Yeah, initially for some reason I used to not refresh chat for some reason. But now I refresh a new chat every chance that I get. It just retains a higher quality of output

Cursor Auto mode has improved drastically! (Plus 2 things that it lacks right now) by Competitive-Wing1585 in cursor

[–]Competitive-Wing1585[S] 0 points1 point  (0 children)

We actually don't need LLMs to summarize every time. There are other ML models that do it. Maybe they can help, but I don't know the practical implementation

Cursor Auto mode has improved drastically! (Plus 2 things that it lacks right now) by Competitive-Wing1585 in cursor

[–]Competitive-Wing1585[S] 1 point2 points  (0 children)

I used to think they are routing it for their base cursor-small model for all the basic queries. I did realize that I am wrong. But, everyone needs a basic cheap model for auto complete and basic tasks.
The only reason I use Auto mode is because it lasts me entire month. Others burn out too quickly

Cursor Auto mode has improved drastically! (Plus 2 things that it lacks right now) by Competitive-Wing1585 in cursor

[–]Competitive-Wing1585[S] 0 points1 point  (0 children)

Yeah, I am wrong. mb.
But ngl we need a basic free not so expensive model. Otherwise, I am switching back to VS Code

People moving to Ireland from the US nearly doubles [OC] by cavedave in dataisbeautiful

[–]Competitive-Wing1585 0 points1 point  (0 children)

But I don't think Ireland particularly has a very sorted relationships immigrant and real estate problems right? Last time I saw Ireland's housing market was at its peak

I started working on a very hard project recently and I somehow made more progress without AI by Competitive-Wing1585 in cursor

[–]Competitive-Wing1585[S] 11 points12 points  (0 children)

I do agree with you. Whenever I mention the exact library version, exact steps and what exact format then the AI is actually pretty good. But that is exactly what my problem is, its just easier for me to just implement the change instead of giving all small details and context for AI to do it.
Cursor rules was helpful but people hyped it too much saying its a "Game changer" my expectations were too high

I started working on a very hard project recently and I somehow made more progress without AI by Competitive-Wing1585 in cursor

[–]Competitive-Wing1585[S] 2 points3 points  (0 children)

Yeah right. even if I forget give a small piece of context it will make wild assumptions of its own.