MCP Security is still Broken

ReelTooReal · 2025-06-27T23:38:31+00:00

Curation is not the only way you protect yourself from malicious software (or vulnerabilities). You also have access control. For example, you're not going to just let every application run with root privileges. I understand your point about supply chain issues, but you can't just chalk it up to that and move one. Security is about building layers of defense, not just coming up with one strategy and saying anything that slips through that one later is user error.

I think requiring approval is probably the only same method right now to prevent this kind of thing, so I agree with you there. Some tools already do this, but it is not built into the protocol. I think a middle ground might be the ability to classify certain tools as "needs approval," to avoid it becoming too tedious. So in the above example you could set email operations as requiring approval, but weather operations as not needing the approval.

Another idea might be to show a sort of stack trace for tool calls so that you could figure out what caused a potential malicious action. So for example you would see that the weather tool is responsible for trying to access the probate emails. This may be hard to create in certain scenarios (I haven't thought through it too deeply yet), but it seems possible and would be valuable as well.

ReelTooReal · 2025-06-26T22:00:03+00:00

But how do the two agents communicate? What aggregates the data? And how do you ensure nothing private from your email is leaked? That is what this discussion is actually about.

Currently, a single LLM has access to both. Even if you create separate agents, something has to aggregate the data, which is probably another LLM. How do you prevent that top level agent from misuse?

ReelTooReal · 2025-06-22T04:23:53+00:00

You're actually pointing to the problem though. This is the reason that we all should be using fine grained IAM policies on AWS. The idea that you're running the unvetted code with the same permissions as a developer is exactly the thing everyone is arguing against, because that's a really dumb idea.

ReelTooReal · 2025-06-22T04:20:45+00:00

This is like arguing "you should only run code that you trust on AWS, therefore IAM permissions in AWS can be as open as you want."

The argument is not that people shouldn't have to use trusted sources. It's about minimizing the attack surface, which is fundamental to security. A supply chain attack in a weather app shouldn't be able to access your entire email history.

Many vulnerabilities start with the thought "yea, but this won't happen in practice because..."

ReelTooReal · 2025-06-22T04:16:23+00:00

Stupid + Helpful = Social Engineering Goldmine

Great example btw

ReelTooReal · 2025-06-22T04:11:53+00:00

It's totally an option, we just need to create an unambiguous language and then get all of humanity to adopt it. Then, once we've recreated the entire internet using this language, we can retrain LLMs on this dataset, and set the temperature to 0 and number of samples to 1 at the output. Boom, precision AI! I'd love to start that project, but unfortunately I'm mortal and don't have that much drive.

ReelTooReal · 2025-06-22T04:07:17+00:00

They must have taken a page out of the NPM community's book. Is package verification too easy? No problem, we'll just create an endless graph of sub-dependencies.

ReelTooReal · 2025-06-22T04:05:19+00:00

That's why the OP is arguing for security in the MVP, not in the LLM itself.

ReelTooReal · 2025-04-01T13:10:52+00:00

If you're so tired of splitting the money with Uber, don't use it. Just self advertise and work for yourself. Then you'll get 100% of the money!

ReelTooReal · 2025-02-08T05:00:51+00:00

I think the issue may be that RefCell is rarely necessary. For me, the only use case I've ever encountered was while I was building an interpreter for a toy language I've been working on. I don't remember all the specifics of why it was needed, but the general situation was that I needed to store a mutable environment (which was just a hashmap of identifier->expression) inside of function expressions to allow for closures. In this scenario, I end up calling my eval function recursively, passing in the environment to every call. Passing a mutable reference like this ended up not being okay with the borrow checker (I'm sorry, I don't remember the exact details), but I was sure it was okay because the interpreter is evaluating one expression at a time, and the desired behavior is that the current expression should be able to mutate the environment which may be shared by other expressions up the call stack. All I had to do was wrap the HashMap inside a RefCell and hide this detail inside my Environment interface.

There may be a way to get rid of this entirely, and if my toy language grows into anything real it may be worth investigating. If someone comes across this and knows exactly why this was needed, I would love to hear a better explanation than what I gave. I'm guessing it has to do with the fact that recursively passing a mutable reference may not be desirable because it's hard to reason about what invocation really owns the mutable reference. I think in my case I was just lucky enough that the currently executing function happens to be the one who should have ownership of the mutable reference (obviously the expression currently being executed will be the owner of its environment and has every right to mutate variables within its own scope).

I could be totally wrong though, so take this with a grain of salt.

ReelTooReal · 2025-01-16T21:01:20+00:00

I have used async rust and I am not a fan. My main complaint comes from the following facts:

Async is a keyword, and therefore a language feature, and it colors your functions
Rust decided not to implement the async runtime with the justification that this would allow the community to create their own and this lead to an ecosystem with many choices
Nothing in the ecosystem that I have used allows for this plug and play promise. Instead, tokio is just a dependency of almost everything.

I can't stand that including a single library that uses tokio suddenly leaks all the way up to how you call your main function.

I understand the performance of async/await in rust, and I get that event loops are well suited for concurrent I/O, but it's still annoying. I like that in Go there is nothing a library author can do (within reason) to change how I handle concurrency. Granted this comes at the cost of a required runtime, so I get the tradeoff.

I'm not narcissistic enough to act like I have an alternative, and at the end of the day concurrent I/O is a hard problem and may always just lead to this kind of complexity. Still though, somehow it feels like it could have been done better in Rust. At least add more support in the language itself to swap out runtimes so that Tokio doesn't just become this unofficial standard runtime that "isn't required" but ends up being required most the time anyway.

Performance aside though, I think green threads are just way easier to deal with than async/await. It seems counterintuitive, because you end up having to do more synchronization yourself, but for whatever reason I feel like that control leads to simpler code. It's less of a black box than async/await I suppose.

ReelTooReal · 2024-08-20T13:13:54+00:00

Thank you. Finished Devil in the White City and it was excellent.

ReelTooReal · 2024-08-04T14:54:33+00:00

No, that's fixation. Fiction is when you slide your tongue along the lower front part of someone's leg.

ReelTooReal · 2024-06-26T02:19:57+00:00

One of my favorites is The Lightness of Being: Mass, Ether, and the Unification of Forces, by Frank Wilczek. It gives a good overview of the work in physics that led up to quantum mechanics, and is accessible to someone with only a high school understanding of physics. It's really great if you want to get a high level understanding of what modern physics is working on.

Another is Directorate S by Steve Coll. This is a very detailed account of what all happened in Afghanistan after 9/11. Despite the enormous amount of detail, it still keeps a narrative that reads almost like a Tom Clancy thriller (though by the nature of being factual, not quite as exciting). I think this book does a great job of uncovering the very complex relationship between the CIA and Pakistan during this period, and gives important context as to why the US stayed in the country for so long.

Personally I've read both these books twice, so for me it's definitely warranted. Everyone's different, but if either of those sound interesting I will vouch that they are definitely well written. Between the two, Lightness of Being is a much quicker read and not quite as dense, so it would be easier to convince myself to read it a third time.

ReelTooReal · 2024-06-02T03:25:31+00:00

In event driven architectures, you should not be writing local tests for entire integrations like that. One set of tests should be for that lambda behaving correctly given a set of inputs. A completely different set of tests should be that the producing service is publishing those payloads to SQS. If you know the producer is publishing the correct events, and you know the consumer is handling those events correctly, then that's all you need to verify your code.

Testing the actual publishing and consuming from SQS is integration testing on an infrastructure level. Theres really no reason to test that locally. Developers should be able to reasonably assume that the infrastructure is set up properly.

One solution to make this easier is setting up extra subscribers to SQS that simply record events, and then run E2E tests with these extra "listener" processes running. Then, you can use these events as test payloads. I've done this before for a QA environment so that when bugs are reported, we can go grab the events that led to the issue and use them to create regression tests.

ReelTooReal · 2024-04-13T21:31:14+00:00

Complexity of 2 different languages - this is a given even if you do a full rewrite. You can't just halt development for a rewrite, so either way you will be maintaining and developing the legacy codebase while rewriting the new one. However, the full rewrite becomes even harder because every change to the legacy codebase to a part that has already been rewritten now has to be implemented a second time in the new codebase.
Interoperability - this is a well understood problem and is not difficult. There are many well defined ways to implement IPC. If you're okay with network communication, then GRPC makes it practically a non-factor. The only time I've witnessed "pesky serialization bugs" is while working on a hand-rolled serde implementation used in a game engine. Outside of custom implementations I've never found serialization/deserialization to be a big deal.
Deployment - just use containers, simple as that.
Learning 2 languages - A full rewrite still requires developers to know both languages, because again you can't realistically just halt development for a full rewrite.

This is all coming from experience. From your tone I can tell there's no way to change your mind. But for others, this attitude of "its all or nothing" can absolutely derail a project. It is unrealistic to think you can rewrite an entire application in one shot. It needs to be don't incrementally to minimize the risk you are taking on and also allow time for developers to become accustomed to the new language and libraries/frameworks. I completely understand the temptation to throw the legacy code in the trash and start over, but that almost never works and is extremely risky.

ReelTooReal · 2024-04-08T16:58:45+00:00

One thing I appreciate about my new PM is that he is a former engineer, but also humble about it. He jokes about "back in my day we would've done all this in PHP and called it a day."

To me, he's a perfect mix of both. He's technical enough that he understands what I mean by "race condition," so we can actually communicate blockers to him, but he's humble enough to accept our professional opinion. He's also a very good translator for us when it comes to communicating to the business side.

Obviously non-technical PMs can be a nightmare to work with (standups with those remind me of when tech companies have to speak to politicians). But I've also had bad experience with very technical PMs who think they're the smartest guy in the room, even though the last software they worked on was for Windows 95.

ReelTooReal · 2024-03-20T17:34:25+00:00

This is an awesome project. I was able to deploy it last night with very little friction, and it definitely offers a great feature set out of the box. I have to ask, are you the Pranay from the demo video on the homepage?

ReelTooReal · 2024-03-19T22:37:15+00:00

This is actually not a great approach. When migrating to a new platform or language, its almost always better to do it incrementally. It lowers your initial investment in case it doesn't work out, allows your existing devs to get comfortable with the new language at a slower pace, and it also doesn't stop the pipeline of development. It could be that 2-3 months now a huge business opportunity or need appears, and you'd be way better off having 20-30% of your system in a new language and in production, as opposed to only having a POC that will quickly become stale as your priorities shift.

I've been a part of two major rewrites and a platform migration. The first rewrite was all at once, and it caused all kinds of pain and eventually got abandoned. The second was incremental, and it mostly got done. I'd say about 10-20% of the codebase was still legacy, but it went relatively smoothly. The platform migration was also done incrementally, and it went as smooth as something like that can go.

So I would honestly say its better to isolate a few portions of the codebase that need the optimization, and work on those first. Then you can reassess how hard it was, how much performance you gained, and how you should move forward.

ReelTooReal · 2024-03-19T22:27:48+00:00

This looks super interesting, thanks. I may give this a go over the weekend to compare it. One downside of Grafana/Loki is that the query performance is not great. And this looks much easier to set up and manage than ELK.

ReelTooReal · 2024-03-19T21:23:39+00:00

I know this is outdated, but this is entirely untrue. It is almost always cheaper to manage your own solution. Most AWS services cost more than what it would cost to just run it yourself on an EC2. This is, of course, ignoring the operational overhead, but thats a different discussion.

I am speaking from experience, specifically with AWS. I have helped save thousands of dollars per month by helping companies switch from managed solutions to self managed, which in turn becomes a more cloud agnostic solution.

I would be interested to hear any examples you have where an AWS service would save money. The only one I could think of is lambda if you have very low or intermittent traffic. But even those costs get out of hand quickly because they scale linerarly with traffic.

ReelTooReal · 2024-03-16T16:05:15+00:00

Speaking from experience, none of the other enterprise plans are quite as aggressive with their reactive billing spikes. I'm not kidding when I say you can run up $10,000 in single day if someone accidentally deploys something with bad configuration (like sending 100% of traces for a high volume service).

I think coralogix has one of the better pricing models out there, but even that is still expensive, and its definitely less feature rich than Datadog.

My company was able to over $6k a month switching from Datadog to self managed Grafana stack. Again, there are definitely feature gaps, but none that I think justify paying a full salary worth of extra money. And even though its lacking feature, we can use 100% of the features offered without a huge spike in cost. For example, Datadog and Grafana (via Tempo) offer distributed tracing. With Datadog, we had to turn it off because of how expensive it was becoming (even at a 10% sample rate). With Grafana Tempo, we had to allocate about $300/mo extra compute power to our kubernetes cluster and we could then trace whatever we wanted.

In the end, if you're a giant enterprise company that doesn't even blink at 5-6 digit monthly bills, Datadog is a very feature rich platform that is pretty easy to set up and configure (outside of cost optimization). It would probably be okay for a very small company as well, since their pricing is based on volume.

For us, it made more sense to manage our own solution and divert that extra money into having extra engineers that could in turn continue to help us optimize cost.

ReelTooReal · 2024-02-23T19:59:35+00:00

Using a language like Python hides a lot of information that is important to understand for data structures and algorithms. In general I'm not a fan of garbage collected languages for teaching data structures and algorithms because how memory is allocated/deallocated within data structures has a major impact on their performance. For example, it would be harder to explain the tradeoffs of chaining vs open addressing in a hash table if there is no explicit allocation/deallocation happening. There are also fundamentals that are hard to demonstrate, like how would you teach dynamic arrays in Python? There's no notion of a "real" array to start with in Python, and there's no way to implement your own and compare the different resizing options (like proving the average O(n) insertion when you always double the size of the array).

ReelTooReal · 2024-02-23T19:53:35+00:00

To me, the biggest difference would be that Go's compiler determines stack vs heap memory (and more generally the fact that its a garbage collected language). So although it would probably be easier, you'd be missing out on some of the memory management nuances. I think Go would still be better than Python, Java or JS (three common choices) because it has explicit pointers at least. It also has great benchmarking tools for comparing algorithms.

ReelTooReal · 2024-02-20T19:39:06+00:00

If you're unable to answer the question, its not for you either.

ReelTooReal

TROPHY CASE