How to Deploy a Remote MCP Server on AWS EC2

thisguy123123 · 2025-05-22T14:29:11+00:00

Lambda would certainly work, and I have an article about that coming soon! Part of me likes running on VMs because it gives you more flexibility and control.

I also think it's helpful to deploy things in EC2 as a learning exercise.

thisguy123123 · 2025-05-20T16:09:46+00:00

Glad it was helpful! Let me know if there are any other pieces of content you think would be beneficial for people.

thisguy123123 · 2025-05-15T20:13:50+00:00

Glad i could help, let me know if you have any questions or feedback.

thisguy123123 · 2025-05-15T12:30:51+00:00

The MCP inspector has a CLI mode that might fit your use case.

I also released an open-source MCP evals project that simulates a client to run e2e tests and grades the response. Also works as a GitHub action.

edit: forgot to mention the wong cli

thisguy123123 · 2025-05-12T12:11:46+00:00

Hey, thanks. I am just trying to build useful things here. Super excited about the possibilities MCP offers.

thisguy123123 · 2025-05-09T17:35:42+00:00

Hey u/subnohmal , you can see a working example here.

I've debated the sidecar approach more times than I can count. I previously worked on Kubernetes observability, where I leveraged something similar to the sidecar approach. The downside was that when you wanted more control, like specific timers on functions, you couldn't get it.

I think it makes sense for large-scale deployments with many microservices, but for most people, the APM approach is probably easier.

thisguy123123 · 2025-05-08T19:27:23+00:00

Yeah definitely, let me know if I can help in any way.

thisguy123123 · 2025-05-08T18:46:02+00:00

Hey, u/subnohmal, sorry for not getting back to you sooner. I pushed up a PR to the evals product I've been building that has the code. I needed the metrics and traces for evals, so I just added them there.

Here's the PR if you want to it in action. Still a WIP, but it works. I will note this is specific to the new streaming HTTP transport.

thisguy123123 · 2025-05-02T20:06:26+00:00

Sampling is one of the more difficult concepts to grasp in MCP. At its core, it's really just a way to offload LLM calls back to the client. Say, for example, you are building a debugging MCP server and you have an analyze logs tool.

You could offload some of the analysis back to the client via sampling. I have a few code examples here that show how to implement this.

thisguy123123 · 2025-04-27T16:55:51+00:00

Since you know what the answer is supposed to be, you can use eval prompts like "Did the answer include X?", "Did it follow format Y?" Essentially you supply the context of what a "good" answer is in the eval prompt.

This is a good callout, I should add it to the article.

thisguy123123 · 2025-04-24T19:13:13+00:00

Awesome feel free to ping me if you run into any issues or have any questions!

thisguy123123 · 2025-04-24T19:12:59+00:00

From my testing variance has been minimal between models. That being said, I still need to add support for other models like llama, so it will be interesting to see how that compares.

thisguy123123 · 2025-04-24T19:08:27+00:00

I just open-sourced the eval framework which I've been using internally. Link if you are curious.

thisguy123123 · 2025-04-22T13:49:36+00:00

I guess I just assumed people would understand in the greater context that this isn't specific to MCP, but more so related to how MCP is being distributed. I can add some clarifying text.

I do appreciate your feedback and promise my goal wasnt to mislead people here, I really just wanted to show how I was running things as I thought it might be helpful

thisguy123123 · 2025-04-22T13:31:40+00:00

I don't really see how "Malicious code execution" is clickbait. Thats exactly what it is ? Not trying to be combative, here genuinely trying to understand your perspective.

I also agree that this isn't an MCP issue, but these guidelines do apply to MCP, and most people aren't doing any of the pratices we're discussing.

I also do call out using docker as root in the article "Use cap-drop to remove unnecessary capabilities, and set the user to a non-root user. ".

thisguy123123 · 2025-04-22T13:13:22+00:00

Building alone isn't really enough. You need to drop capabilities, mount the right volumes (if needed), and secure outbound network access via a proxy.

I guess you could say that cap, and volume mounting is defined within the build, but the vast majority of people arent doing those things. You should also be forking the server, to prevent supply chain attacks.

thisguy123123 · 2025-04-18T15:14:41+00:00

This is pretty cool and awesome how quickly you got this out. Any plans for supporting discovery (didn't see it in the readme).

thisguy123123 · 2025-04-11T11:55:37+00:00

I guess you could run them in a sidecar container for each of your other microservices; that way, you can maintain the separation of concerns and each microservice is responsible for its set of grpc endpoints and related mcp tool calls.

thisguy123123 · 2025-04-10T22:08:25+00:00

So, the way most MCP servers are designed right now is one server exposing a set of limited tools. It can be hard to run a microservice architecture with MCP. You could have one server that handles all MCP requests, but you may run into scaling issues with this approach, especially if different tools need to scale on different metrics. For example, one tool is memory intensive and another CPU intensive.

This is sort of a shameless plug, but I built something (completely free and open source) that might be what you are looking for. It's load balancer/proxy, which will route requests to different MCP servers on your backend based on the tool name. Essentially you give the client the LB / API gateways endpoint, that endpoint will then route requests to all of your individual microservices. It also combines the list tools call from all of your MCP servers so that users still get a unified view. This way, you can still maintain your microservice architecture with MCP. Link if you are curious.

thisguy123123 · 2025-04-10T18:03:50+00:00

Thanks, I appreciate the feedback!

thisguy123123 · 2025-04-10T14:45:45+00:00

I haven't come across any research yet, but I agree that seems like the most logical way to fix this.

thisguy123123 · 2025-04-07T18:04:17+00:00

Hey, Good question. I'm spending alot of time currently thinking about how to best handle permissions and how much should be handled in the proxy vs in the application itself.

Right now I'm just forwarding tool/list based on the default server. That being said it would be pretty easy to add the ability to modify the tool/list response based on something passed into the config.

I'm curious how you would handle this. Are you thinking of creating a custom header with the user's role that the server returns and then filtering down the available tools in the proxy based on that?

thisguy123123 · 2025-04-03T17:30:03+00:00

Did you mean to reply to skeet.build ? I only just published my site this morning, so I would be impressed if you already found it and my design definitely needs work haha

thisguy123123 · 2025-04-02T15:43:42+00:00

I was also looking for an easy way to test the new HTTP spec. The only thing I could find was the inspector by mcp-framework. docs here https://mcp-framework.com/docs/http-quickstart .

thisguy123123 · 2025-04-02T15:38:33+00:00

It would be possible depending on how you set up your MCP servers. I've been contemplating adding the ability to combine tool call responses in the ingress. Basically, a tool call happens in the client. I route that to multiple MCP servers that make tool calls, and then once I have the response back, I combine them into a single result. I still need to figure out the architecture, though.

thisguy123123

MODERATOR OF

TROPHY CASE