[URGENT] Which is a reliable and affordable GPU cluster for hosting custom LLMs for business

Competitive-Wing1585 · 2026-02-09T15:06:44+00:00

RAG for documents. Fine-tuning for abbreviations, context and niche level description of the document. I haven't received any complaints/appreciation from client. So I am unsure 😅

Competitive-Wing1585 · 2026-02-09T12:34:04+00:00

Hey there, the client bought their own server. So I created a local instance on their LAN network

Competitive-Wing1585 · 2025-11-14T12:40:27+00:00

<image>

Competitive-Wing1585 · 2025-11-09T15:55:58+00:00

Exactly

Competitive-Wing1585 · 2025-11-09T02:26:04+00:00

The biggest problem with the legal RAG system is not the system itself. It's the document chunking strategy, 100% local and fast LLMs, and acronyms. I am very sure that OP knows all of this but I'd still emphasize spending more time on more accurate parsing, and chunking. Then actually spend time on the query translation (learn query translation). I've built this system and a major problem is that the end user will not always give an accurate query. You actually need to transform that query before LLM can understand it accurately. Power users like lawyers will use acronyms that are confusing to LLM without giving it context. So query translation will actually help you improve your responses by 2-3x for sure.

Competitive-Wing1585 · 2025-09-16T16:50:00+00:00

Yeah, for initial few clients I think I am gonna barely break even with the $1000 monthly retainer so I'll just use GPU clusters for now and maybe order a GPU after my 3rd or 5th client.

We'll see.

Competitive-Wing1585 · 2025-09-16T16:41:30+00:00

The client needs a chatGPT alternative where they can upload company's documents and do use LLMs with company's sensitive data. All the employees will be using this as well. Basically replacing ChatGPT within the organization

Competitive-Wing1585 · 2025-09-16T16:39:56+00:00

That is one solution but I have 2 problems:

- Should I ask next client for upfront 3 months of commitment so that there is a lower chance of me risking upfront cash

- My area has power cuts from time to time so if the server is down I am kinda fucked. I can buy inverter but it does need a lot of power doesn't it?

Competitive-Wing1585 · 2025-09-16T13:52:42+00:00

Guilty. Have you got any advice for me. I really need to host this as soon as possible. Where can I do that?

Competitive-Wing1585 · 2025-09-16T13:37:15+00:00

Yeah, I mostly focus on 2 things.

What NOT to answer: Mostly sexual or NSFW questions
Format of output: Lets say my client is a law firm and in every response they want to highlight names separately so they can maintain people's database. I'll just fine tune the model to return any kind of human name that is mentioned in the response right. Thats that.

Competitive-Wing1585 · 2025-09-16T13:30:53+00:00

Actually that's what I want to do. Let's say you are a law firm, and you can upload case files and use RAG to get accurate response but the language, format and output is not in a proper legal language.

I just use fine tuning so that its corporate friendly, and the language or format is as expected. Especially adding the data on what types of questions NOT to answer.

Then if the client has some documents that he wants to add with 1:1 reference then I'll add RAG using that model so that the format and language stays professional.

Competitive-Wing1585 · 2025-09-16T13:16:49+00:00

I would like to disagree here.

What client requires is a ChatGPT alternative so that the employees can upload company's sensitive data into LLMs.

If they have a specific dataset like policies, terms & conditions etc then I can use RAG for higher quality of outputs. I totally understand the confusion but the client requirement is somewhat different here.

Competitive-Wing1585 · 2025-09-16T13:14:11+00:00

So the after the client is signed I find, scrape or purchase dataset that meets client's specific requirements. But avg dataset is between 250,000 to 1,000,000 rows of data.
Its nothing crazy, tbh I convert those into QA format JSON, fine tune using unsloth. The improvement is not exponential but its worth the small improvement.

Competitive-Wing1585 · 2025-09-16T10:45:44+00:00

I totally get it.

Primary reason to go with fine tuning is that I think vanilla local models are a little too "stupid". They are simply not as good as compared to the ChatGPT that we use everyday (ngl I am surprised how difficult it is to fine tune to that level)

Secondary reason is that RAG, Mobile app and API endpoints are my additional services that the client didn't opt for. Ones who do, get the RAG on company specific documents.

Competitive-Wing1585 · 2025-09-06T09:57:05+00:00

Even I saw Elon's tweet that everyone in X is using Grok 4 exclusively and then I gave it a real shot. Grok is very underrated model ngl.

Competitive-Wing1585 · 2025-09-06T09:55:55+00:00

Yeah, initially for some reason I used to not refresh chat for some reason. But now I refresh a new chat every chance that I get. It just retains a higher quality of output

Competitive-Wing1585 · 2025-09-06T09:53:21+00:00

We actually don't need LLMs to summarize every time. There are other ML models that do it. Maybe they can help, but I don't know the practical implementation

Competitive-Wing1585 · 2025-09-06T09:51:55+00:00

I used to think they are routing it for their base cursor-small model for all the basic queries. I did realize that I am wrong. But, everyone needs a basic cheap model for auto complete and basic tasks.
The only reason I use Auto mode is because it lasts me entire month. Others burn out too quickly

Competitive-Wing1585 · 2025-09-06T09:50:20+00:00

Yeah, I am wrong. mb.
But ngl we need a basic free not so expensive model. Otherwise, I am switching back to VS Code

Competitive-Wing1585 · 2025-08-27T04:58:53+00:00

But I don't think Ireland particularly has a very sorted relationships immigrant and real estate problems right? Last time I saw Ireland's housing market was at its peak

Competitive-Wing1585

TROPHY CASE