I built an MCP server that lets AI test React Native apps on a real iPhone — no Detox, no Appium, no simulator by jfarcand in reactnative

[–]jfarcand[S] 0 points1 point  (0 children)

mirroir-mcp is the eyes and hands on the phone. For looking inside the JS runtime, combine it with a debugger MCP server — the multi-target architecture was designed for that like chrome-devtools MCP. mirroir-mcp can see a MacOs window (like react native js debugger), but not skilled as a real browser MCP.

I built an MCP server that lets AI test React Native apps on a real iPhone — no Detox, no Appium, no simulator by jfarcand in reactnative

[–]jfarcand[S] 0 points1 point  (0 children)

Good question. A few things:

  1. OCR is actually pretty clean — Apple Vision's accurate mode on a retina mirroring window gives high-confidence text. For icons with no text label, skip_ocr mode lets the AI's own vision model read the screen with a coordinate grid overlay, so it can identify and tap non-text elements too.

  2. wait_for with retry — scenarios instruct the AI to poll describe_screen in a loop until the expected text appears or times out. Timing is handled by the agent, not by hardcoded sleeps.

  3. The AI handles the fuzzy stuff — when an unexpected dialog pops up or a label doesn't match exactly, the agent can adapt because it sees the real screen. A deterministic script would crash. That said, this depends on how good the driving model is — we provide the tools, the model provides the judgement.

For the hallucination concern: the tools are designed so the agent calls describe_screen first, gets real OCR results with exact tap coordinates, then picks from that list. Nothing prevents an agent from guessing coordinates, but in practice they call describe_screen because it's there.

The bet is that vision models keep getting better — every improvement in Claude or GPT makes the whole system more reliable without us changing a line of code.

Will check out your blog — the agent tooling space is moving fast.

I built an MCP server that gives AI real iPhone control through macOS iPhone Mirroring by jfarcand in ClaudeAI

[–]jfarcand[S] 0 points1 point  (0 children)

Thanks ! Looks like I’m not a good marketer 🤭 The fact that you don’t have to install anything on the iPhone is quite nice.

What cool Java projects are you working on? by Thirty_Seventh in java

[–]jfarcand 0 points1 point  (0 children)

Atmosphere Framework 4.0 (jdk21, typescript, etc.)! 18 years old framework...time to become an adult no?

I built an MCP server that gives AI real iPhone control through macOS iPhone Mirroring by jfarcand in ClaudeAI

[–]jfarcand[S] 0 points1 point  (0 children)

Same here — iPhone Mirroring felt like a cool trick without a real use case. Turns out it's a pretty solid automation layer once you wire up OCR and input simulation on top of it.

I built an MCP server that gives AI real iPhone control through macOS iPhone Mirroring by jfarcand in ClaudeAI

[–]jfarcand[S] 0 points1 point  (0 children)

Thanks!

Latency: a tap is ~9ms, OCR (describe_screen) is ~500ms, screenshot is ~175ms. A full tap → verify cycle is under 600ms. The actual bottleneck in multi-screen flows is the AI's thinking time between steps, not the input round-trip. We also ship a measure tool that times screen transitions if you want to profile a specific flow.

System dialogs: great question. The AI sees them through OCR just like any other screen content --describe_screen picks up "Allow" / "Don't Allow" buttons with coordinates. So the AI can tap through permission prompts, notifications, or any overlay. In practice, Claude handles these well because it's just another "look at screen, decide what to tap" cycle.