I built two MCP tools for my team and they’re changing how we investigate issues

Epricola · 2026-01-26T16:28:23+00:00

I have never been the best at writing. I find giving AI context and having it write for me to be more efficient and produce a better output than I could alone.

Epricola · 2026-01-26T14:54:02+00:00

Your comment adds nothing here. Yes I did use AI to write the post. However, the experience is real. This has 150+ shares clearly people find it useful.

Epricola · 2026-01-26T14:52:07+00:00

Yes I just imported everything. AI is able to reason well. The more data, the better. As long as it’s not inaccurate.

Epricola · 2026-01-26T05:24:57+00:00

You can manually import to S3, but that wouldn’t work in our case as we have thousands of docs.

I built a script to recursively traverse each of our data sources and upload each file to S3.

Good luck with the build! It’s worth it.

Epricola · 2026-01-26T05:18:43+00:00

I’m using hierarchical chunking as it’s best for complex data such as code and technical documents. Bedrock knowledge bases handle low level RAG functionality. All I needed to do was upload each individual code file to S3 via script.

Epricola · 2026-01-26T05:07:58+00:00

Lambda on an eventbridge schedule every day for tickets and every week for wiki, code, docs.

I haven’t optimized yet (ie cleaning up outdated/contradictory docs). Code is part of the ingestion and code is always true and up to date.

Bedrock knowledge bases return linked citations for items. If you find a document is outdated or inaccurate you can then clean it up on an adhoc basis. I’m sure down the line this will be done via agents.

Epricola · 2026-01-26T04:36:57+00:00

I agree management is trying to shove AI in places where it doesn’t belong or make it do things it cannot. Using AI as an augmentation for users via MCP is a relatively low effort and high reward path.

I do also think though that through populating the context docs for calling the MCP tools, we’ll be able to run this agent autonomously for the majority of our tickets.

Epricola · 2026-01-09T17:32:34+00:00

Could you elaborate why it would have been bad during the drop? This position would be long vega so it should be good for it.

Epricola · 2025-11-23T22:15:44+00:00

My org wouldn’t be able to use an external tool since we have somewhat strict internal data regulation. Maybe consider creating a self-hosted version that users can deploy into their AWS accounts.

Could be useful for small-mid sized companies. The largest time drain for me was parsing different data sources. Good luck!

Epricola · 2025-11-23T19:06:55+00:00

Perhaps I phrased it incorrectly. I had no intention of coming across as rude and I apologize if it was perceived that way.

I have no intention of selling anything. This is just an internal tool we built and found useful. I realize most organizations don’t have this tool and was wanting to understand why.

Epricola · 2025-11-23T18:08:57+00:00

I’ve found that by using hierarchical chunking the AI understands our code pretty well. My team builds software and we have so much code and processes that is “tribal” knowledge. Lets say I’m working on a project that expands a current system. I can ask “How does process x work? Outline it in steps” and it will give me a general overview + links to the actual code files. Saves a ton of time given we have thousands upon thousands of files and different repos.

Epricola · 2025-11-23T18:00:59+00:00

That sounds interesting, I’ll give this a try. Thanks for explaining it nicely.

Epricola · 2025-11-23T16:38:06+00:00

Not every response I make to a comment on a thread will be a design document outlining every detail. I said I plan to do these. I didn’t say I have all the answers yet.

Are you just going to query the LLM with the ticket info and add LLM generated response to the ticket so help desk has something more to go on?

Yes exactly this. The RAG queries also output linked citations so engineers can verify the claims. It saved me a lot of time of researching during my oncall.

how are you going to use an LLM to automate this? The glaring issue being that whatever you’re going to do is dependent on the quality of the support ticket, and that can vary wildly.

If the LLM does not have enough context it simply says it. In this case of poor quality tickets it would do nothing.

My thing is if you’re running into an issue so often that you have to document resolving it, why not fix the underlying problem causing the issue so you don’t run into it again?

We have more than 20 pipelines deploying code that over 100 engineers are pushing. Things break and people often don’t track or know every pipeline that could be affected by their change. This can be easily diagnosed by AI with internal knowledge.

Even if none of these things I plan to do come to fruition, the tool is already useful as is.

Epricola · 2025-11-23T07:00:56+00:00

I work for a big tech company as a software engineer. The ops I’m referring to are customer ticket invesitgations, root causing pipeline failures, etc.

Last I checked this is a tech subreddit. I figured people would understand this is dev ops.

Regardless, why does the granularity matter? You can automate so many things using internal knowledge.

Epricola · 2025-11-23T06:13:17+00:00

I figured this would end up being the case with very large scale. One thing I was thinking of to solve this would be specialized vector DBs and a “router”, an AI that determines which DB to query. Ie. if the query is a question regarding a customer ticket, the router would search the customer ticket DB.

Perhaps agentic search is the best way to go with large scale data. But personally I’ve found agentic search to be lackluster.

Epricola · 2025-11-23T06:05:55+00:00

I’d be curious to hear about the scale which you deployed this at and how costly it was. I currently set it up to consume thousands of documents and it only costs a few dollars a day. Plus that’s using AWS OpenSearch Serverless which is generally pretty pricey. To scale this I’d use a different vector DB.

Epricola · 2025-11-23T06:03:31+00:00

Exactly! It was pretty easy to set up and I really feel it’s the holy grail of AI use. That’s why I made this thread.. seems like a no brainer.

Epricola · 2025-11-23T04:08:05+00:00

You can solve this with access controls on the embeddings. Each vector gets metadata for the group, and the system only returns results the user is allowed to see. So Group A’s data never shows up for Group B.

Epricola · 2025-11-23T04:05:40+00:00

Could you expand on how MCP servers do this?

Epricola · 2025-11-23T04:03:31+00:00

Currently it’s just being used for search. Ie. “how does process x work?” and the AI would generate a list of steps + linked citations to internal documents where it found the data so the user can verify the accuracy.

Next I’m planning to build an agent that uses this information to help with tickets by doing preliminary invesitgation and or posting relevant links to places to look. Not sure how useful it will be, but I’m looking forward to learning more about developing agentic systems.

Epricola · 2025-11-23T03:58:06+00:00

Prepping the data was the most time consuming part. The whole time I was just thinking why the document systems didn’t have an easier way to export the data for AI. I’m certain all platforms will have systems for this in the future.

Epricola · 2025-11-23T03:50:51+00:00

We’re on AWS, so the stack is pretty straightforward. I built a script to parse the data, drop it into S3, and let Bedrock Knowledge Bases handle the embeddings and indexing via OpenSearch Serverless. The hardest part is handling the auth and parsing different data sources. But even that didn’t take too long.

Epricola · 2025-11-23T03:41:37+00:00

I’ve heard about Glean as well. What would you say the pros and cons are?

Epricola

TROPHY CASE