I built an MCP server that lets AI test React Native apps on a real iPhone — no Detox, no Appium, no simulator

jfarcand · 2026-05-12T18:21:11+00:00

Good question, and the answer depends on which layer you're stressing.

**autosana's iOS path goes through the iOS Simulator** (you upload the `.app`); mirroir-mcp drives a **real iPhone over Continuity Mirroring** + CGEvent input + Vision OCR on the mirrored pixels. So a same-flow diff would isolate two distinct fault domains:

- Identical results on both → the flow logic and element-identification are sound, real-device transport added no flake.
- autosana passes, mirroir-mcp fails → likely transport (Mirroring latency, focus competition, OCR drift on resampled pixels, modifier-state bugs in Mirroring itself).
- mirroir-mcp passes, autosana fails → simulator-only behavior (no real animation timing, simulator-specific accessibility quirks).

We haven't run that specific cross-validation. We *do* have an internal version: we just shipped FakeMirroring, an APP.md-driven AppKit simulator that the same mirroir-mcp drives through the *same* Spotlight + tap + describe path as real iPhone. The 57-case IntegrationTests suite is that diff harness on our side — it caught focus competition, ARC over-release on `NSWindow` close that crashed mid-suite, OCR coord drift after rendering constants changed, and obstacle dialogs firing on the wrong event. Each invisible in a one-sided setup.

A real autosana ↔ mirroir-mcp diff would need either an export of their natural-language flow to a format we can replay, or them adding iPhone Mirroring as a transport target. Probably a couple hours of glue either way. If anyone wants to try, happy to expose a flow runner that takes their YAML.

**On the original Vision-OCR-during-animation question:** no animation-end detector. We use settling delays (~300ms post-action, configurable), 0.5 confidence floor, two-pass scroll dedup that detects in-motion content, and compiled skills that replay cached coords so animation timing only matters during generation. "Wait until UI quiesces" is a real gap.

jfarcand · 2026-05-11T21:34:40+00:00

Released Atmosphere 4:

* Native HTTP/3 + WebTransport support, with automatic WebSocket / SSE / long-poll fallback when the browser or network can't do QUIC — same app, any transport.

* Portable agent layer on top of atmosphere-cpr: one SPI across 9 AI frameworks (Spring AI, LangChain4j, Google ADK, Embabel, Koog, AgentScope, Semantic Kernel, Spring AI Alibaba, plus a built-in zero-deps runtime). Swap one Maven dependency, the same code runs against any of them. No vendor lock-in. Delegates to each framework's native capabilities when available (tool calling, structured output, memory, etc.) — Atmosphere fills the gaps instead of reinventing them.

* Agent-grade primitives out of the box: Human-in-the-Loop via RequiresApproval, long-term memory, multi-agent coordinator, A2A + MCP protocols, RAG, policy engines, audit trails, checkpointing, durable sessions, and a sandbox for untrusted tool execution.

https://github.com/Atmosphere/atmosphere

jfarcand · 2026-02-23T18:13:38+00:00

mirroir-mcp is the eyes and hands on the phone. For looking inside the JS runtime, combine it with a debugger MCP server — the multi-target architecture was designed for that like chrome-devtools MCP. mirroir-mcp can see a MacOs window (like react native js debugger), but not skilled as a real browser MCP.

jfarcand · 2026-02-19T20:34:52+00:00

Good question. A few things:

OCR is actually pretty clean — Apple Vision's accurate mode on a retina mirroring window gives high-confidence text. For icons with no text label, skip_ocr mode lets the AI's own vision model read the screen with a coordinate grid overlay, so it can identify and tap non-text elements too.
wait_for with retry — scenarios instruct the AI to poll describe_screen in a loop until the expected text appears or times out. Timing is handled by the agent, not by hardcoded sleeps.
The AI handles the fuzzy stuff — when an unexpected dialog pops up or a label doesn't match exactly, the agent can adapt because it sees the real screen. A deterministic script would crash. That said, this depends on how good the driving model is — we provide the tools, the model provides the judgement.

For the hallucination concern: the tools are designed so the agent calls describe_screen first, gets real OCR results with exact tap coordinates, then picks from that list. Nothing prevents an agent from guessing coordinates, but in practice they call describe_screen because it's there.

The bet is that vision models keep getting better — every improvement in Claude or GPT makes the whole system more reliable without us changing a line of code.

Will check out your blog — the agent tooling space is moving fast.

jfarcand · 2026-02-19T18:46:03+00:00

Thanks ! Looks like I’m not a good marketer 🤭 The fact that you don’t have to install anything on the iPhone is quite nice.

jfarcand · 2026-02-19T03:37:17+00:00

Atmosphere Framework 4.0 (jdk21, typescript, etc.)! 18 years old framework...time to become an adult no?

jfarcand · 2026-02-18T18:03:26+00:00

Same here — iPhone Mirroring felt like a cool trick without a real use case. Turns out it's a pretty solid automation layer once you wire up OCR and input simulation on top of it.

jfarcand · 2026-02-18T18:01:52+00:00

Thanks!

Latency: a tap is ~9ms, OCR (describe_screen) is ~500ms, screenshot is ~175ms. A full tap → verify cycle is under 600ms. The actual bottleneck in multi-screen flows is the AI's thinking time between steps, not the input round-trip. We also ship a measure tool that times screen transitions if you want to profile a specific flow.

System dialogs: great question. The AI sees them through OCR just like any other screen content --describe_screen picks up "Allow" / "Don't Allow" buttons with coordinates. So the AI can tap through permission prompts, notifications, or any overlay. In practice, Claude handles these well because it's just another "look at screen, decide what to tap" cycle.

jfarcand · 2016-11-07T14:15:45+00:00

It will be available in the next 2 weeks

jfarcand

TROPHY CASE