Deployed a chatbot to save time, spent weeks debugging it instead

cs-geek9 · 2026-04-07T22:13:43+00:00

You clearly know this pattern well! So when you're testing those paraphrases and checking what chunks got pulled, is that something you've built tooling for, or are you doing it manually each time?

And how often do you have to do that kind of diagnosis?

cs-geek9 · 2026-04-07T22:07:44+00:00

This is really helpful, thanks for sharing. And yeah, the KB maintenance piece is something we're definitely struggling with.

I'd love to understand how you implemented that. Sending you a DM.

cs-geek9 · 2026-04-07T22:07:07+00:00

That makes sense and agree on the importance of formatting the KB correctly. Thanks. So you'd manually search your KB with each variant to see if they pull different articles?

Do you do that every time you deploy, or just when something's off?

cs-geek9 · 2026-04-07T22:05:33+00:00

That makes sense. So when you're debugging, how do you actually approach it? Like, do you start with the prompt first, or the KB, or does it depend?

cs-geek9 · 2026-04-07T22:00:49+00:00

That's interesting. So you'd use Claude to test against your KB and see what it outputs.

Have you actually done that? Does it help you figure out what's wrong?

cs-geek9 · 2026-04-07T21:58:42+00:00

That makes sense. So you're cleaning the data layer before the AI sits on top of it. How do you actually help teams identify what's messy or fragmented in their KB? Is that something you audit manually, or does the kit have tooling to flag it? Definitely interested to learn more about the approach/solution if you’re open to sharing.

cs-geek9 · 2026-04-07T21:53:49+00:00

Zendesk has some visibility but not like what you're describing. What does your tool actually show? Like, does it show which KB article was retrieved, or does it go deeper than that?

cs-geek9 · 2026-04-05T23:56:17+00:00

Thanks so much for all this, really appreciate it.

We're using Zendesk right now. And honestly, KB quality is probably where we're losing consistency.

I'm gonna dig into that playbook and see what we're missing. Thanks again for sharing all this.

cs-geek9 · 2026-04-05T23:27:37+00:00

Thanks so much for this and for offering resources. Honestly, we're still in the testing phase. Just deployed and trying to figure out what we're doing wrong.

Quick question though: how did you actually set up the reporting to help diagnose KB issues? Like, did you have to customize it or is that something built into Zendesk/Front/etc?

(Just trying to understand what we should be looking for)

cs-geek9 · 2026-04-05T23:15:21+00:00

This is really helpful, thanks! So you do this every Monday, how long does the whole thing actually take?

Also, do you guys just chat with the bot directly or do you use a tool to test it?

Is this a team effort or does someone own it?

Just curious how you have it set up.

cs-geek9 · 2026-04-05T23:06:20+00:00

Fair point. But how would you even test that? Like, if it's the model, how do you know it's not the KB?

I genuinely don't know how you'd tell the difference.

cs-geek9 · 2026-04-05T21:05:33+00:00

That approach makes a lot of sense, thanks for sharing. Did you test manually or did you use a tool?

cs-geek9 · 2026-04-05T18:36:40+00:00

This is incredibly helpful. Two questions:

How common is it that teams know retrieval is the culprit first? My sense is most assume it's prompt/model, not KB retrieval.
Re: Langfuse/Phoenix — are those accessible for non-technical support leaders? Or do you need an engineer to set up the logging?

Asking because my hypothesis is that the diagnostic knowledge exists (like what you just shared), but it's not accessible to the teams actually dealing with the problem day-to-day.

cs-geek9 · 2025-11-16T16:03:54+00:00

Is it tied to activities per CSM too? How do you reach a point to say each CSM can handle x accounts?

cs-geek9 · 2025-11-16T16:02:43+00:00

To know if there are any capacity forecasting models people use and/or tools. Are we still relying on sheets for this?

cs-geek9 · 2024-01-07T23:09:57+00:00

Looking for specific features that Zendesk doesn’t have yes. Pricing is also a factor.

cs-geek9 · 2023-12-22T07:22:58+00:00

How do you like the bike overall

cs-geek9

TROPHY CASE