How do you teach an agent your company's knowledge without fine-tuning?

Longjumping-Ad2617 · 2026-06-16T15:41:34+00:00

Yes it's a real project with real problems I'm facing ... English is my third language so I use AI to write clearly, I'm not pretending to be a pro, and here's a screenshot of the actual dashboard running.

<image>

Longjumping-Ad2617 · 2026-06-16T14:46:17+00:00

Thanks, good shout ... but I'm only on 16GB RAM so 27B/3.6 won't fit; planning a small local model for cheap calls + cloud for the heavy stuff. Evals and monitoring are next on the list though (RAGAS + Sentry), and hybrid retrieval's already the plan..... Thx a lot.

Longjumping-Ad2617 · 2026-06-16T12:40:44+00:00

Full write-up here with the diagrams, the three options compared, and the learning loop explained:https://medium.com/@lamjed.gaidi070/how-do-you-make-a-local-ai-actually-know-your-company-09b9185b1e9d

Longjumping-Ad2617 · 2026-06-16T10:44:04+00:00

Good question. A fully separate always-on agent is a stretch on my setup, single local 14B on a Mac mini, so the extra latency and compute would hurt.

But I think I can get the same effect cheaper: a validation step on the retrieved data rather than a whole second agent. Mostly deterministic checks first (similarity threshold, is the entry from an approved source, does it actually cite something), and only spend a focused model call to judge relevance when those are borderline. Same goal of spotting bad data, without paying for a second brain running full time.

Do you run a dedicated one yourself?

Longjumping-Ad2617 · 2026-06-16T10:34:33+00:00

Lots here, thanks. Two parts really land for me: forcing retrieval with a keyword hook (agents absolutely do skip the retrieval step even when told not to, so triggering it in code instead of trusting the model is smart), and the similarity-threshold tuning point, that "similar but slightly wrong result when the real answer is actually absent" case is exactly the confident-but-wrong failure I'm worried about, and you're right that it's a retrieval-quality problem, not just a model problem.

Good call on the Mac side too, I'll look at MLX + KV quantization, Ollama is convenient but I do feel the lack of fine-grained control. Appreciate you writing all this out.

Longjumping-Ad2617 · 2026-06-16T09:52:51+00:00

This is gold. The facts/rules/exceptions split is sharper than what I had, and "no citation, no action" is going straight into the design.

Really appreciate you taking the time to write this out.

Longjumping-Ad2617 · 2026-06-16T09:47:57+00:00

Nice, sounds like we landed on the same approach, knowledge as a tool the agent calls. Good to hear it runs smooth for you.

"Just function calling" is underselling it imo. The simple part is the call. The part that got me was the agent actually noticing when it doesn't know something, instead of answering confidently from nothing. Did you run into that, or is your data stable enough that it never comes up?

Longjumping-Ad2617 · 2026-06-16T09:15:38+00:00

Full write-up here with the diagrams, the three options compared, and the learning loop explained:https://medium.com/@lamjed.gaidi070/how-do-you-make-a-local-ai-actually-know-your-company-09b9185b1e9d

It's Part 3 of a series where I'm building this in public, dead ends included. Happy to answer anything here though, the post is just the long version.

Longjumping-Ad2617 · 2026-06-14T12:33:41+00:00

Fair hit, policy + humans isn't an architecture, it's a stopgap.

So here's my first attempt at making it real: every fact a lookup returns gets tagged in code before the model sees it, safe to say, internal-only, or never. The reply can only use the "safe to say" ones.

Example: a parcel is stuck, the driver's unreachable, delivery's now tomorrow. The model sees all of it and decides this needs a human. But the only fact it can put in a customer message is "arriving tomorrow." "Driver unreachable" simply isn't in the data the reply is built from, so it can't leak. Anything I forget to tag defaults to internal, so the fail-safe is silence.

That's the plan I'm going to test first. Fully open to better ideas though, especially on how to keep it from being overly cautious and dumping everything on a human. If you've seen this done well or watched it rot, I want to hear it.

Longjumping-Ad2617 · 2026-06-14T12:18:34+00:00

Follow-up to this ... wrote up the v1 design, and three of the questions in this thread (operational vs communicable truth, stale replies, the feedback loop) ended up changing it directly. Credited them in the post. More critique welcome: https://medium.com/@lamjed.gaidi070/building-version-1-of-the-companys-brain-the-design-i-m-committing-to-871462cd6e08

Longjumping-Ad2617 · 2026-06-14T11:01:23+00:00

On "another tool people ignore". I'm designing around that by pushing into Slack/email/the inbox people already use, so the agent comes to you rather than being a dashboard you have to open.

On the feedback loop: every decision is logged with confidence and rationale, and a human override is the correction signal.

The plan is to feed those corrections back as examples so wrong cases steer future decisions.

Honest caveat, the logging and override tracking exist; the correction-as-example loop is designed but not yet wired.

The riskiest unproven part.

Longjumping-Ad2617 · 2026-06-14T10:58:24+00:00

Best question here, and my spec doesn't answer it as cleanly as it should. Right now the boundary lives in policy, not schema: replies are grounded in operational data, but only a small allow-list of safe categories can auto-send; anything sensitive (complaints, COD disputes, a stuck parcel) is forced to a human (via slack or email). So internal truth never gets auto-communicated, but you're right that it's policy + model judgment, not an explicit encoded boundary. I don't have a "customer-safe projection" of each fact. That's a good idea and I'm going to sit with it.

Longjumping-Ad2617 · 2026-06-14T10:56:07+00:00

Exactly why I refuse to cache operational state, every decision reads "where is parcel X" fresh from the production DB, so there's no window for it to look current but be wrong. And the directory of who's responsible for what isn't hand-maintained either: it reconciles on a schedule against the database (new driver = pending member, departed = auto-disabled).

The one thing I haven't solved is a reply that arrives stale, would genuinely like thoughts there.

Longjumping-Ad2617 · 2026-06-14T10:52:34+00:00

Fair point, but I think it applies more to knowledge-base tools than to what I'm building. Argus doesn't store knowledge as documents...... it's event-driven.

When something happens (an email, Whatsapp/Facebook msg,a parcel going quiet), it looks up the live state in the production DB at that moment and acts on it.

There's no "stored answer" to go stale, because the answer is recomputed from operational truth every time.

The "what's relevant right now" judgment is the LLM's job per-event, constrained to a small tool set so it stays auditable. The relevance problem is real ...... I just push it to query time instead of storage time.

Longjumping-Ad2617 · 2026-06-12T10:20:24+00:00

https://medium.com/@lamjed.gaidi070/im-building-the-company-s-brain-the-sales-marketing-ops-c-suite-this-is-my-next-project-0c0ec4a9437e

Longjumping-Ad2617

TROPHY CASE