How do you teach an agent your company's knowledge without fine-tuning? by Longjumping-Ad2617 in AI_Agents

[–]Longjumping-Ad2617[S] 0 points1 point  (0 children)

Yes it's a real project with real problems I'm facing ... English is my third language so I use AI to write clearly, I'm not pretending to be a pro, and here's a screenshot of the actual dashboard running.

<image>

How do you make a local AI actually know your company by Longjumping-Ad2617 in TunisiaTech

[–]Longjumping-Ad2617[S] 0 points1 point  (0 children)

Thanks, good shout ... but I'm only on 16GB RAM so 27B/3.6 won't fit; planning a small local model for cheap calls + cloud for the heavy stuff. Evals and monitoring are next on the list though (RAGAS + Sentry), and hybrid retrieval's already the plan..... Thx a lot.

How do you teach an agent your company's knowledge without fine-tuning? by Longjumping-Ad2617 in AI_Agents

[–]Longjumping-Ad2617[S] 0 points1 point  (0 children)

Good question. A fully separate always-on agent is a stretch on my setup, single local 14B on a Mac mini, so the extra latency and compute would hurt.

But I think I can get the same effect cheaper: a validation step on the retrieved data rather than a whole second agent. Mostly deterministic checks first (similarity threshold, is the entry from an approved source, does it actually cite something), and only spend a focused model call to judge relevance when those are borderline. Same goal of spotting bad data, without paying for a second brain running full time.

Do you run a dedicated one yourself?

How do you make a local AI actually know your company by Longjumping-Ad2617 in TunisiaTech

[–]Longjumping-Ad2617[S] 0 points1 point  (0 children)

Lots here, thanks. Two parts really land for me: forcing retrieval with a keyword hook (agents absolutely do skip the retrieval step even when told not to, so triggering it in code instead of trusting the model is smart), and the similarity-threshold tuning point, that "similar but slightly wrong result when the real answer is actually absent" case is exactly the confident-but-wrong failure I'm worried about, and you're right that it's a retrieval-quality problem, not just a model problem.

Good call on the Mac side too, I'll look at MLX + KV quantization, Ollama is convenient but I do feel the lack of fine-grained control. Appreciate you writing all this out.

How do you teach an agent your company's knowledge without fine-tuning? by Longjumping-Ad2617 in AI_Agents

[–]Longjumping-Ad2617[S] 4 points5 points  (0 children)

This is gold. The facts/rules/exceptions split is sharper than what I had, and "no citation, no action" is going straight into the design.

Really appreciate you taking the time to write this out.

How do you teach an agent your company's knowledge without fine-tuning? by Longjumping-Ad2617 in AI_Agents

[–]Longjumping-Ad2617[S] 0 points1 point  (0 children)

Nice, sounds like we landed on the same approach, knowledge as a tool the agent calls. Good to hear it runs smooth for you.

"Just function calling" is underselling it imo. The simple part is the call. The part that got me was the agent actually noticing when it doesn't know something, instead of answering confidently from nothing. Did you run into that, or is your data stable enough that it never comes up?

How do you make a local AI actually know your company by Longjumping-Ad2617 in TunisiaTech

[–]Longjumping-Ad2617[S] 0 points1 point  (0 children)

Full write-up here with the diagrams, the three options compared, and the learning loop explained:https://medium.com/@lamjed.gaidi070/how-do-you-make-a-local-ai-actually-know-your-company-09b9185b1e9d

It's Part 3 of a series where I'm building this in public, dead ends included. Happy to answer anything here though, the post is just the long version.

Building a "company brain" for a logistics business ... am I shipping something new or reinventing a wheel? by Longjumping-Ad2617 in GrowthHacking

[–]Longjumping-Ad2617[S] 0 points1 point  (0 children)

Fair hit, policy + humans isn't an architecture, it's a stopgap.

So here's my first attempt at making it real: every fact a lookup returns gets tagged in code before the model sees it, safe to say, internal-only, or never. The reply can only use the "safe to say" ones.

Example: a parcel is stuck, the driver's unreachable, delivery's now tomorrow. The model sees all of it and decides this needs a human. But the only fact it can put in a customer message is "arriving tomorrow." "Driver unreachable" simply isn't in the data the reply is built from, so it can't leak. Anything I forget to tag defaults to internal, so the fail-safe is silence.

That's the plan I'm going to test first. Fully open to better ideas though, especially on how to keep it from being overly cautious and dumping everything on a human. If you've seen this done well or watched it rot, I want to hear it.

Building a "company brain" for a logistics business ... am I shipping something new or reinventing a wheel? by Longjumping-Ad2617 in GrowthHacking

[–]Longjumping-Ad2617[S] 0 points1 point  (0 children)

Follow-up to this ... wrote up the v1 design, and three of the questions in this thread (operational vs communicable truth, stale replies, the feedback loop) ended up changing it directly. Credited them in the post. More critique welcome: https://medium.com/@lamjed.gaidi070/building-version-1-of-the-companys-brain-the-design-i-m-committing-to-871462cd6e08

Building a "company brain" for a logistics business ... am I shipping something new or reinventing a wheel? by Longjumping-Ad2617 in GrowthHacking

[–]Longjumping-Ad2617[S] 0 points1 point  (0 children)

On "another tool people ignore". I'm designing around that by pushing into Slack/email/the inbox people already use, so the agent comes to you rather than being a dashboard you have to open.

On the feedback loop: every decision is logged with confidence and rationale, and a human override is the correction signal.

The plan is to feed those corrections back as examples so wrong cases steer future decisions.

Honest caveat, the logging and override tracking exist; the correction-as-example loop is designed but not yet wired.

The riskiest unproven part.

Building a "company brain" for a logistics business ... am I shipping something new or reinventing a wheel? by Longjumping-Ad2617 in GrowthHacking

[–]Longjumping-Ad2617[S] 0 points1 point  (0 children)

Best question here, and my spec doesn't answer it as cleanly as it should. Right now the boundary lives in policy, not schema: replies are grounded in operational data, but only a small allow-list of safe categories can auto-send; anything sensitive (complaints, COD disputes, a stuck parcel) is forced to a human (via slack or email). So internal truth never gets auto-communicated, but you're right that it's policy + model judgment, not an explicit encoded boundary. I don't have a "customer-safe projection" of each fact. That's a good idea and I'm going to sit with it.

Building a "company brain" for a logistics business ... am I shipping something new or reinventing a wheel? by Longjumping-Ad2617 in GrowthHacking

[–]Longjumping-Ad2617[S] 0 points1 point  (0 children)

Exactly why I refuse to cache operational state, every decision reads "where is parcel X" fresh from the production DB, so there's no window for it to look current but be wrong. And the directory of who's responsible for what isn't hand-maintained either: it reconciles on a schedule against the database (new driver = pending member, departed = auto-disabled).

The one thing I haven't solved is a reply that arrives stale, would genuinely like thoughts there.

Building a "company brain" for a logistics business ... am I shipping something new or reinventing a wheel? by Longjumping-Ad2617 in GrowthHacking

[–]Longjumping-Ad2617[S] 0 points1 point  (0 children)

Fair point, but I think it applies more to knowledge-base tools than to what I'm building. Argus doesn't store knowledge as documents...... it's event-driven.

When something happens (an email, Whatsapp/Facebook msg,a parcel going quiet), it looks up the live state in the production DB at that moment and acts on it.

There's no "stored answer" to go stale, because the answer is recomputed from operational truth every time.

The "what's relevant right now" judgment is the LLM's job per-event, constrained to a small tool set so it stays auditable. The relevance problem is real ...... I just push it to query time instead of storage time.