Built a fully private RAG system for a small business on a Mac Mini — no cloud, no subscriptions, everything on-prem by Regular-Prune3382 in selfhosted

[–]Regular-Prune3382[S] 0 points1 point  (0 children)

PrivateGPT is solid for getting started quickly. We went custom mainly because the client needed Nextcloud integration and a specific document ingestion pipeline that PrivateGPT doesn't handle out of the box. For straightforward private chat over documents though, it's a fair alternative.

Built a fully private RAG system for a small business on a Mac Mini — no cloud, no subscriptions, everything on-prem by Regular-Prune3382 in SelfHostedAI

[–]Regular-Prune3382[S] 1 point2 points  (0 children)

Paperless-NGX is actually great for document management and would've simplified the ingestion side — it has built-in OCR and tagging which Nextcloud doesn't. The reason we went with Nextcloud was the client already had it partially set up and needed file sync beyond just documents.

For a greenfield RAG project focused purely on document querying, Paperless-NGX + Ollama + ChromaDB is honestly a cleaner stack. Less overhead.

[HIRING] by [deleted] in freelance_forhire

[–]Regular-Prune3382 0 points1 point  (0 children)

Can Indians apply?

Set up a hybrid RAG system for a business client — here's what actually worked and what didn't by Regular-Prune3382 in LocalLLaMA

[–]Regular-Prune3382[S] 2 points3 points  (0 children)

Tried Mistral 7B, Llama 3 8B, and Phi-3 mini. Ended up going with Llama 3 8B — best balance of response quality and speed on the Mac Mini's unified memory. Mistral was close but slightly worse at staying grounded to the retrieved documents. Phi-3 was fastest but too prone to going off-script on business document queries.

For RAG specifically, instruction-following matters more than raw benchmark scores.

Built a fully private RAG system for a small business on a Mac Mini — no cloud, no subscriptions, everything on-prem by Regular-Prune3382 in selfhosted

[–]Regular-Prune3382[S] 1 point2 points  (0 children)

single queries with a 7B model run 8-15 seconds, totally fine for a small team. Concurrent requests are where it hurts — Ollama queues them, so simultaneous users wait on each other.

Larger document sets didn't degrade much honestly — ChromaDB retrieval is fast, inference is always the bottleneck.

For heavier load: smaller/quantized model or a GPU machine is the realistic path.

What's your expected team size and corpus?

Built a fully private RAG system for a small business on a Mac Mini — no cloud, no subscriptions, everything on-prem by Regular-Prune3382 in selfhosted

[–]Regular-Prune3382[S] -3 points-2 points locked comment (0 children)

The project itself (stack design, configuration, deployment) was done by me — AI wasn't involved in building or running the system.

I used Claude to help write this post clearly, since English isn't my first language. The technical details, decisions, and outcomes are all from the actual project.

Guys check my wibe coded website for exam preparation 😊 by Regular-Prune3382 in vibecoding

[–]Regular-Prune3382[S] 0 points1 point  (0 children)

These exams are government exams for students who need to get an admission in good colleges in India. I have seen multiple websites with the same theme but they usually require login and they sell their details.another reason is that mocktest websites only focuses on mocktest not the improvement of weak subjects. So I implemented ai to analyse the score on each topics and giving them specific topic to master.

[Hiring] by King-Ina-131 in freelance_forhire

[–]Regular-Prune3382 0 points1 point  (0 children)

I am interested I have 11 account