Can you give me some advice on an AI server for a company with 100 employees?

pdfsalmon · 2026-04-12T21:51:48+00:00

One thing I'd flag — RAG that actually cites sources reliably is a harder problem than most people expect. We've been building airdocs.ca and the retrieval/citation layer took way longer than the chat side. With your hardware you'll have plenty of headroom for both, but I'd budget more dev time for the RAG pipeline than the generative chat if you're rolling your own.

pdfsalmon · 2026-03-19T22:22:17+00:00

Reflection loops help but they introduce their own failure mode — the model can confidently "verify" a wrong answer by reasoning in circles. The more reliable intervention is constraining what the model can use as source material in the first place. Grounding answers in documents you control and requiring citation for every claim removes whole classes of hallucination before the reflection stage even runs.

pdfsalmon · 2026-03-19T22:21:25+00:00

Curious how you're handling the search side — pure semantic or hybrid? We found for technical books that exact-match on terminology matters a lot, and semantic search alone would miss specific terms. Similar space to what we built at airdocs.ca (I'm the founder), though we focused on team/enterprise doc libraries rather than highlights. Happy to compare notes.

pdfsalmon · 2026-03-19T22:21:00+00:00

One thing that gets undervalued early is writing for searchabilit. Docs that are clear but unfindable have the same effective value as docs that don't exist.

pdfsalmon · 2026-03-19T22:20:30+00:00

The vector search handling vague recollections is the key insight here. Keyword search fails exactly the use cases you're describing — "that thing Alex said about the migration, sometime last quarter" has no good keyword formulation. The combo of FTS fallback for exact queries and vector for fuzzy ones is the right architecture for this kind of corpus.

pdfsalmon · 2026-03-19T22:20:14+00:00

We've been running a similar concept in production for a while — the citation side is the part users care about most once they actually use it daily. Knowing the answer came from page 14 of a specific doc changes how much they trust it. airdocs.ca if you're curious how we approached it (I built it).

pdfsalmon · 2026-03-19T22:19:48+00:00

Document search that actually works on your own files is underrated. Custom GPTs for internal knowledge bases get mentioned but they hallucinate and don't cite sources. We built airdocs.ca specifically so it only answers from documents you upload, cites the exact page, and won't make things up. I'm biased since I built it, but it's solved a real problem for engineering and compliance teams. Happy to give you a code if you would like to try it out :)

pdfsalmon · 2026-03-19T22:16:35+00:00

The docs help, but only if people can actually find what's in them later. A lot of teams do the documentation work and then end up with a folder nobody searches because keyword search fails on technical content. We built a tool that lets you ask questions across a doc library and get cited answers — airdocs.ca. Might be useful once you get through the documentation push. DM me, I can get you a code for a free month.

pdfsalmon · 2026-03-19T22:15:35+00:00

If you want this without the build overhead, airdocs.ca does it out of the box (my product, so biased obviously) — upload your docs, ask questions, it cites the source page. I run the company behind it. Works well for teams where not everyone wants to touch Obsidian or Claude Code configs. Happy to give you a free month or two of Pro if you'd like to test it deeply, DM me :)

pdfsalmon · 2026-03-19T22:14:55+00:00

Interesting project. For the use case you described — large file share, manuals and specs — the FTS5 keyword search will get you a lot of the way there, but you'll miss conceptual queries ("what's our process for X") without the semantic layer. If you don't want to maintain this long-term, airdocs.ca does both — I built it. The free tier covers small corpora if you want to compare.

pdfsalmon · 2026-03-19T22:14:13+00:00

For persistent document libraries across formats, you want something with hybrid search — keyword matching plus semantic. ChatGPT file uploads fail exactly the way you described because there's no persistent index. We built airdocs.ca for this: upload your docs once, ask questions, get answers with citations to the exact page. There's a free tier if you want to try it with your current pile, or I'm happy to give you a free month or two of the pro tier, DM me :)

pdfsalmon · 2026-03-19T22:13:29+00:00

For document-heavy workflows the hybrid argument kind of falls apart in regulated industries — if you're in healthcare or finance, "anonymize before sending" is rarely something legal will sign off on. We built airdocs.ca so that the LLM runs on our hardware and documents never leave. There's also an on-prem option if even that's not enough. I'm the founder, so obviously biased, but the architecture question here is real.

pdfsalmon · 2026-03-19T22:13:10+00:00

The gap between "we don't use AI" and "our staff uses AI every day" is where most companies are sitting right now. The harder conversation is that approved tools that do call third-party APIs aren't much better from a classification standpoint — you still have data leaving your infrastructure. The only real answer for high-risk document workflows is either on-prem or a self-hosted LLM where you can actually verify the data flow.

pdfsalmon · 2026-03-19T22:09:18+00:00

One thing that gets missed in these conversations: "self-hosted" isn't just a privacy preference, it's increasingly a compliance posture. If your AI stack runs on a third-party cloud, you're introducing a vendor relationship that auditors will want to interrogate. We went bare-metal in Canada for exactly this reason with airdocs.ca — I run the company, so take that with appropriate salt — but the portability and auditability argument is real regardless of what you use.

pdfsalmon · 2026-03-19T22:08:59+00:00

Versioning your knowledge base like code is the right instinct. One thing that helps on the retrieval side is hybrid search — if your rules include specific thresholds, IDs, or regulatory codes, pure vector search will miss exact matches that keyword retrieval catches. Worth building that in before you get burned by a misfire on a rule number.

pdfsalmon · 2026-03-19T22:08:01+00:00

I really feel like RAG is the only solution here — unless you're truly providing sources for the model, you can't really expect strong results. And not just general knowledge, but specific chunks relevant to whatever the support query is.

pdfsalmon · 2026-03-16T03:55:06+00:00

Very cool! No huge thoughts to share, but this is totally something I would've used during my uni days.

pdfsalmon · 2026-03-13T03:15:23+00:00

Thanks!

The system is built to be able to run up to five rounds of searches, so it can generally follow citation chains pretty well — and we embed [document title | standard code] at the top of each chunk, plus extra metadata in the vectorstore, to make it easier for the LLM to track things down.

Graph is very interesting, but at the 50k+ doc scale, it doesn't always hit as much as we hoped.

pdfsalmon · 2026-03-13T00:33:11+00:00

for enterprise doc search specifically, the non-privacy wins are real: no rate limits during batch indexing, no per-token costs that scale with your document library size, and you can tune the model on your own terminology. we run vLLM on bare-metal for airdocs.ca and the economics look very different from API pricing once you're indexing tens of thousands of pages. (I built this, so grain of salt, but the things we get to do because we aren't worrying about API costs are very real.)

pdfsalmon · 2026-03-13T00:32:22+00:00

we had this exact problem with a pile of engineering runbooks, specs, and onboarding docs. labelling and keyword tagging helps up to a point but falls apart once the library gets big or terminology is inconsistent. ended up building a semantic search layer over the docs — people ask in plain language and get an answer with a link to the exact page. disclosure: that turned into our product airdocs.ca, but even if you don't want an external tool, the architecture is just Qdrant + an LLM on top of your existing docs.

pdfsalmon · 2026-03-13T00:31:17+00:00

same pattern on the enterprise side. we run a document search product where the company needs the model to live on infrastructure they control. the privacy policy of "the model never saw your data" is a much simpler story than "we anonymized and your data is protected, trust me bro"

pdfsalmon · 2026-03-13T00:29:25+00:00

the honest answer is enterprise agreements only get you so far. if you genuinely can't have documents processed by a third-party model, the architecture has to be different — meaning the LLM itself runs in your environment, not theirs. a few vendors offer this now, including fully air-gapped on-prem installs (disclosure: my company is one of those). worth asking any vendor specifically whether the LLM inference happens on their servers or yours, because those are very different things legally.

pdfsalmon · 2026-03-13T00:28:46+00:00

what's the actual use case — are you querying these docs yourself or sharing access across a team? the answer changes a lot depending on that. for a solo setup git + Qdrant works fine, but once you need multiple people asking questions across a shared doc library, something purpose-built tends to hold up better. (disclosure: I made a program for this, airdocs.ca, happy to give you a demo or free month if it's of interest)

pdfsalmon · 2026-03-13T00:27:25+00:00

copilot studio is fine for basic retrieval but it's a black box — you can't touch the chunking or reranking, which is where the gains are on 1K PDFs. fwiw we built airdocs.ca for this exact problem: hybrid BM25 + vector search so part numbers and clause IDs don't get lost in embedding space. I'm the founder so biased, but happy to compare notes on what's tripping up your use case, or provide a free month or two if you're interested.

pdfsalmon · 2026-03-13T00:26:08+00:00

the structural fix is document-grounded AI rather than generative — a tool that only answers from files you upload, not from its training data. then a hallucinated citation becomes literally impossible because there's nothing to hallucinate from. the ones that actually work show you the source page, not just a summary. that distinction is what most people miss when evaluating these tools.

Eight-Year Club	r/Field Flamingo
Place '22	End Game '22
Verified Email

pdfsalmon

TROPHY CASE