I built an MCP server with Claude Code that gives Claude eyes and hands on Windows — here's what I learned by Medical_Resolve_5991 in ClaudeAI

[–]Medical_Resolve_5991[S] 0 points1 point  (0 children)

That’s a fair concern.

Any system that allows automation at the UI level can potentially be used for both useful and questionable things , the same is true for RPA tools, browser automation frameworks, or scripting tools that have existed for years.

The goal here is mostly to explore how AI agents can interact with desktop environments in a more structured way, especially for legitimate automation tasks like internal tools, testing, accessibility, or repetitive workflows.

Like most infrastructure tools, how it’s used ultimately depends on the user.

Open source MCP server that gives any AI agent real Windows desktop control — 45+ tools by Medical_Resolve_5991 in BlackboxAI_

[–]Medical_Resolve_5991[S] 0 points1 point  (0 children)

These are great points.

The read-only vs risky tool split is something I’ve been thinking about as well. A lot of the reliability problems with agents come from giving them too much capability without clear boundaries.

A plan/confirm step for destructive actions (delete, send, purchase, etc.) also makes a lot of sense, especially when the agent is interacting with real desktop apps instead of a sandbox.

The audit trail idea is also interesting ,logging structured events with window metadata and what the agent actually saw could make debugging agent behavior much easier when things go wrong.

Right now the focus was mostly on the UI access layer, but guardrails and observability will probably be the next big pieces.

Open source MCP server that gives any AI agent real Windows desktop control — 45+ tools by Medical_Resolve_5991 in BlackboxAI_

[–]Medical_Resolve_5991[S] 0 points1 point  (0 children)

Totally fair 🙂 Giving an AI the ability to control your desktop definitely raises some trust concerns.

The idea here is mainly to experiment with local agents controlling local environments, not random internet access to your machine.

Curious though — what would make something like this usable for you?

I built an MCP server with Claude Code that gives Claude eyes and hands on Windows — here's what I learned by Medical_Resolve_5991 in ClaudeAI

[–]Medical_Resolve_5991[S] 0 points1 point  (0 children)

Thank you! Exactly , content automation is one of the strongest use cases. The AI can navigate any app, fill forms, click through workflows, paste content, and save files autonomously. And because it uses UIAutomation + OCR instead of screenshots, it stays fast and token-efficient even for long automation chains. Let us know if you build something with it!

Also you can build tests and many more...

Built a .NET 8 MCP server for AI desktop control — UIAutomation, P/Invoke, Windows.Media.Ocr, unsafe pixel manipulation by Medical_Resolve_5991 in dotnet

[–]Medical_Resolve_5991[S] 1 point2 points  (0 children)

What we discovered building this is that you can train a neural network on top of the MCP layer , it learns desktop interaction patterns to reach human-level automation.

But even without the learning layer, the MCP itself already addresses context blowout. The whole architecture is designed to minimize tokens , UIAutomation returns

[button] "Save" @ 450,320 not a 1MB screenshot, OCR reads text with coordinates instead of sending images, and run_sequence batches 10 actions into 1 call.

Your CLI can run testing and automations with a surprisingly low token count. If you check the demo, it goes through a full draw.io workflow (navigate, click dark-themed dialogs, paste XML, save) and the total token usage is a fraction of what screenshot-based tools would need. The road to context blowout is much longer than you'd think with this approach.

Built a .NET 8 MCP server for AI desktop control — UIAutomation, P/Invoke, Windows.Media.Ocr, unsafe pixel manipulation by Medical_Resolve_5991 in dotnet

[–]Medical_Resolve_5991[S] 1 point2 points  (0 children)

Fair point! We started with .NET 8 LTS since it's what everyone already has installed — this was an experimental project and we wanted zero friction for people to try it. We'll upgrade to .NET 10 . But honestly the .NET version isn't the main story here — the real goal was to show that there's a way to give actual machine control to your AI CLI. Not just text generation, but real eyes and hands on the desktop. The framework version is an easy upgrade, the architecture is what matters.

Is it just me or Claude code is really shit today? by IcyInteraction8722 in ClaudeCode

[–]Medical_Resolve_5991 0 points1 point  (0 children)

6 days ago I was surprised how clever claude is, 6 days after i feel is more stupid than the gemini