Duchan

Posted on May 29 • Edited on Jun 11

tapflow v0.3.x: Deeplinks, Keyboard Shortcuts, Screenshot API, and an Experimental MCP Server

#webdev #opensource #devtools #ios

tapflow started as a simple idea: stream iOS simulators and Android emulators to the browser so anyone on the team can do mobile QA without touching Xcode or Android Studio. v0.2.x got the core working — streaming, touch input, App Center, session recording.

v0.3.x is about filling in the gaps that matter during actual QA sessions. This post covers what shipped and ends with something we're still figuring out: an experimental MCP server that lets LLM agents control simulators directly.

Deeplink execution from the browser

The one that came up most in real usage: testers frequently need to trigger deeplinks to verify specific app states — product detail pages, notification payloads, OAuth redirects. The old workflow always involved a mobile developer — either having them trigger it on their machine or building a debug menu inside the app specifically for this purpose.

In v0.3.0 you can now fire a deeplink directly from the QA session toolbar. Click the link icon (or ⌘K), enter the URL, and it executes on the active device.

Under the hood it's a new open-url WebSocket message type that routes browser → relay → agent:

Browser ──open-url──► Relay ──open-url──► Mac Agent
                                              │
                           iOS: xcrun simctl openurl booted <url>
                           Android: adb shell am start -a VIEW -d <url>
Browser ◄──open-url:done/error── Relay ◄──────┘

The DeviceAgent interface got a new openUrl(url) method, so both iOS and Android agents implement it symmetrically. The relay routes it and returns either open-url:done or open-url:error with the failure reason. The dashboard shows a toast either way.

Keyboard shortcuts for simulator controls

QA sessions are repetitive. Reaching for the toolbar icons on every screenshot or rotation adds up. v0.3.0 adds keyboard shortcuts to all the common actions:

Shortcut	Action
`⌘K`	Open deeplink dialog
`⌘S`	Take screenshot
`⌘⇧Y`	Start / stop recording
`⌘⇧O`	Rotate simulator
`⌘⇧U`	iOS: press Home
`⌘⇧K`	iOS: toggle software keyboard

Tooltips now show the shortcut hint inline, so they're discoverable without reading docs. One implementation detail worth noting: key detection uses e.code instead of e.key. This matters for IME input — Korean, Japanese, and Chinese users composing text would otherwise trigger shortcuts mid-composition.

Screenshot REST endpoint

This one unlocks a new class of CI usage.

GET /api/v1/sessions/:sessionId/screenshot returns a PNG or JPEG of the current simulator screen. You can call it with a PAT token from any CI step — before asserting a visual state, during an automated flow, after a build install.

The tricky part was the request/response pattern. The relay communicates with agents over WebSocket (long-lived, multiplexed), but HTTP is request/response. Screenshots are taken on the Mac, not the relay.

We introduced a requestId-based pending map: the relay generates a unique ID, sends a take-screenshot message to the agent over WebSocket, registers a promise keyed by requestId, and resolves it when screenshot:result comes back. The HTTP handler awaits that promise and sends the binary payload:

GET /api/v1/sessions/:id/screenshot
    │
    ▼
Relay: generate requestId, push to pending map
    │
    ├──screenshot-request──► Mac Agent
    │                            │ simctl io screenshot (iOS)
    │                            │ ADB screencap (Android)
    ◄──screenshot:result─────────┘
    │
    ▼
HTTP 200 (binary image)

iOS supports both PNG and JPEG via --type. Android returns PNG regardless — ADB doesn't offer format selection at this layer.

PAT scope enforcement

Personal Access Tokens existed before v0.3.0, but the scope field wasn't actually enforced on API routes. A developer scoped token could call any endpoint.

v0.3.0 adds proper scope checks to all builds endpoints. PATs are now enforced at the middleware layer: a token issued for builds access can upload and manage builds, but can't touch team settings or session data. This makes it safe to issue narrow tokens for CI pipelines without giving them broader access than they need.

Frame performance instrumentation

For anyone debugging streaming latency: v0.3.x adds per-frame hop timestamps via a binary header (TFFE — tapflow frame envelope). Each frame now carries the capture time, relay-received time, and client-received time in an 8-byte prefix before the JPEG/H.264 payload.

The dashboard can surface a live performance overlay showing frame latency broken down by segment (agent → relay, relay → browser). Useful when diagnosing whether a slowdown is in the network leg or the browser decode path.

Experimental: an MCP server

v0.3.x also ships @tapflowio/mcp-server (0.3.1-experimental.1) — it exposes tapflow's WebSocket/REST APIs as MCP tools so an LLM agent can drive a simulator the same way a human does in the browser: screenshot → reason → tap/type → screenshot again.

It's early (the experimental suffix is literal — consistency and error-recovery still need work), and it's a big enough topic to have its own write-up: Giving an LLM Eyes and Hands on a Mobile Simulator covers the full tool list, the normalized-coordinate tap/swipe, and where this is headed (LLM-driven smoke tests in CI).

npm install -g @tapflowio/mcp-server@experimental

Try it

npm install -g tapflow
tapflow start
# http://localhost:4000

🔗 GitHub: https://github.com/jo-duchan/tapflow
📖 Docs: https://www.tapflow.dev

DEV Community