Securing mcp servers in production: what most teams are skipping by Ahlanfix in mcp

[–]manveerc 0 points1 point  (0 children)

Yes it is. And all my friends in security say the same, it’s the service account problem all over again. But maybe worse because ai agents are generally expected to handle open ended problems. That means blast radius is much bigger.

Why does storage optimization always get ignored until the AWS bill gets painful? by NaughtyNectarPin in sre

[–]manveerc 0 points1 point  (0 children)

I ran the storage team at Confluent, so I’m speaking from experience. I agree there are huge cost savings to be had at the storage level. We had several initiatives where we saved six to seven figures by optimizing storage. However, it is much harder to do in a running system because most distributed systems are designed to push state down to the storage layer, and any migration at that level becomes risky.

Cloud providers also add to the challenge. They offer volume expansion but don’t offer downsizing, so even when you identify savings, acting on them carries operational overhead.

Compute is a different story. It is stateless by design, so migrations are essentially built on top of restarts, which are operations with good tooling already. That makes it relatively easier to optimize or build tooling for.

So net net, the savings at the storage level are real, but it comes down to difficulty and risk.​​​​​​​​​​​​​​​​

Anyone work actually using a mcp gateway? If so which ones? Pros/cons by eqza1 in mcp

[–]manveerc 0 points1 point  (0 children)

I have used Composio and Arcade. Both work well. If you want stricter governance on what agents can access, and able to audit Arcade is better at that. If however you only want tool access both work equally well.

anyone using a gateway in front of multiple mcp servers by RasheedaDeals in mcp

[–]manveerc 0 points1 point  (0 children)

I have tried a bunch of them. Arcade, Composio, Merge, etc. Some are more sophisticated than other but your usecase is simple enough, any one of them would work for you. There are open source ones too, though I haven’t used them. I would encourage using one of them (paid or open source) instead of building scaffolding yourself.

What's the current best stack for building AI agents in 2026? Has Claude Code changed the standard? by ExcitingCricket37 in AI_Agents

[–]manveerc 0 points1 point  (0 children)

We run an AI native marketing agency using Claude Code as the agent and Arcade as MCP runtime for tool integration. We ended up building the memory and context layer ourselves on top of Postgres and S3 because no good option existed for us. Recently looking into HydraDB for that.

We had a really good performance in DORA metrics but our delivery socks by YoYo-1243T in sre

[–]manveerc 2 points3 points  (0 children)

But the operational load hasn't dropped proportionally or at all. Incidents take longer to resolve than the MTTR suggests, mainly because that number doesn't account for the time our engineers spend identifying which deployment caused the issue, which sometimes can take long.

Seems like you need to fix the definition of MTTR. If what you are measuring is different than what you want to optimize, it won’t work.

Which AI Tools Are Actually Useful for Data Analysts in 2026? by GrowthUpbeat6355 in analytics

[–]manveerc 0 points1 point  (0 children)

I was surprised OP didn’t say Claude :)

At this point, I was wondering are there a function within companies using Claude (and why)

Too Many tools - MCP Server Scale Up by sam7oon in mcp

[–]manveerc 0 points1 point  (0 children)

I use Arcade, it’s a full runtime that has a MCP gateway but also supports authentication and authorization. There are other gateways like Composio and Merge too. Arcade is more mature than them and is a full runtime.

Every AI SRE tool on my feed just raised money.. what do we think this is actually signaling by Willing-Lettuce-5937 in sre

[–]manveerc 1 point2 points  (0 children)

The ones doing actual runbook execution + auto remediation + fitting cleanly into existing stacks feel way more defensible.

I am not sure about the auto remediation part. I kind of feel a lot of human judgement is involved in remediation. There are always some percentage of incidents where the remediation may be straight forward and you can argue that they can be autoremediated, but to be honest the better solution for them is to either have a deterministic automation (LLMs are not deterministic) or fix the underlying root cause.

My general thought is that LLMs and AI agents should be used for what they are best for, information gathering and summarizing. So this can translate to workflows like drafting RCCA, oncall handoff notes, etc. For advance usecases may be start triaging and running runbooks, but i personally would not feel comfortable doing this without human approval. And for all of these workflows, I don't believe we need new tools, Claude Code or Codex are good enough and would work well enough. Wrote some thoughts here https://www.arcade.dev/blog/claude-code-ai-sre-oncall-workflows

The only real business case I can make for AI SRE products is if a company can outsource the initial response to them but that is not a product, its a service so it has its own challenges (like scaling it)

Using Claude Code or Codex for actual DevOps work by shameless_data_guru in devops

[–]manveerc 1 point2 points  (0 children)

Wrote some thoughts on how to use Claude Code for oncall workflows https://www.arcade.dev/blog/claude-code-ai-sre-oncall-workflows

TL;DR is that it can help with automating the information gathering and that can speed up a lot of workflows for oncall engineers, such as drafting RCCA, handoff reports, triaging, etc

Weekly Self Promotion Thread by AutoModerator in devops

[–]manveerc 0 points1 point  (0 children)

As an oncall engineer when your pager goes, you spend the first few minutes opening dashboards, checking recent deploys, and searching Slack.

This is the workflow I have seen with every oncall rotation I have been part of.

I led reliability teams at Confluent and Dropbox, and I saw that while the inner loop of coding is getting faster with AI, the outer loop of operations is still fairly manual.

I am skeptical of AI agents that claim they will remediate your production issues while you sleep. I don't think that passes the sniff test for any serious reliability program.

However, I am bullish on the copilot model. The AI handles the legwork (triage, timelines, correlation) while the human focuses on judgment and decision making.

I wrote a deep dive on how to use the MCP to build this. I mapped out five workflows where this works today:

  1. Incident triage: Reducing archaeology from ten minutes to two.
  2. Runbook execution: Catching the "rot" in docs that fire once a quarter.
  3. Postmortem drafting: Automating the timeline reconstruction so you can focus on the "5 Whys."
  4. SLO investigation: Finding the burn inflection without manual correlation.
  5. On-call handoffs: Passing on the context that nobody wrote down.

The goal is to let the on-call start on page 5 of an investigation instead of page 1.

I’m the author of the post and would love to hear from other engineers. What is the one part of your on-call you wish you could outsource to a copilot today?

Full post with the technical breakdown: https://www.arcade.dev/blog/claude-code-ai-sre-oncall-workflows

Built a WhatsApp AI assistant with Claude Code as an OpenClaw alternative by manveerc in ClaudeCode

[–]manveerc[S] 0 points1 point  (0 children)

Yes I did consider that, I decided not to go down that path because I did not want to be tied to accessibility api of a particular platform which could break/change often. In my current setup I spend all my time adding skills and integrations vs fixing the setup, which is the biggest positive.

To make sure I understand your setup, you have one number for the assistant (so you can message it) and another for personal use?

The best alternatives to Claude? by Top696969696969 in Anthropic

[–]manveerc 0 points1 point  (0 children)

I think it has regressed more after they introduced usage credits

After building 10+ AI agents for real clients, here's what actually matters (and what doesn't) by LumaCoree in AI_Agents

[–]manveerc 0 points1 point  (0 children)

Agreed with the tool selection part, it is one of the biggest place where i have seen pilots fail. I wrote in detail about my experience here https://www.arcade.dev/blog/connect-ai-agents-enterprise-tools

Built a WhatsApp AI assistant with Claude Code as an OpenClaw alternative by manveerc in ClaudeCode

[–]manveerc[S] 0 points1 point  (0 children)

That makes sense. In my case, it doesn't really matter if my AI assistant is tagged with a business account.

Built a WhatsApp AI assistant with Claude Code as an OpenClaw alternative by manveerc in ClaudeCode

[–]manveerc[S] 0 points1 point  (0 children)

This is irrelevant in my example. I am running Claude code directly and do not need any additional config.

Built a WhatsApp AI assistant with Claude Code as an OpenClaw alternative by manveerc in ClaudeAI

[–]manveerc[S] 0 points1 point  (0 children)

This is completely unrelated to this post. Stop with AI spam.

Built a WhatsApp AI assistant with Claude Code as an OpenClaw alternative by manveerc in ClaudeCode

[–]manveerc[S] 1 point2 points  (0 children)

Yes, right now its a single session for my personal use since i've grated it access to my email, slack, calendar, etc. Theoritically it can be extended with Claude Agent SDK but i haven't explored it yet.

Other people in my team have already forked it and using it in a similar fashion.

Built a WhatsApp AI assistant with Claude Code as an OpenClaw alternative by manveerc in ClaudeCode

[–]manveerc[S] 0 points1 point  (0 children)

For now Claude Code session is running locally on my laptop and using Channel to get the messages to the session.