DEV Community

Duchan
Duchan

Posted on • Edited on

Your whole team can now run mobile QA from the browser. Here's how we built it.

If you work on a mobile product, you've probably seen this.

Physical devices are never enough. Covering every OS version is even harder — iOS doesn't support downgrading, so maintaining a range of versions means managing a pool of locked devices, which is overhead nobody wants.

But the bigger friction is access. Simulators only run on a developer's Mac, behind complex toolchains. Anyone on the team who isn't a mobile developer has to ask one every single time they need to verify something:

Server / FE developer — "How do I install the sandbox build to check what was deployed?"

Product manager — "I keep having to install and remove different versions just to compare behavior."

Designer — "I need to check the layout across screen sizes, but I don't have the right devices."

Cloud simulator services exist. But uploading internal app builds to an external service — and paying monthly fees for simulators already running on Macs you own — was never something we wanted to do.

So we built tapflow: an open-source, self-hosted tool that streams iOS simulators and Android emulators to the browser. Anyone on your team opens the dashboard, picks a device, and starts interacting — no Xcode, no Android Studio, no setup.

npm install -g tapflow
tapflow start
# → http://localhost:4000
Enter fullscreen mode Exit fullscreen mode

This post is about how we built it — specifically the parts that weren't obvious.


Why we didn't just use Appetize or BrowserStack

Both services solve the browser access problem. We evaluated them seriously. Before signing up, we hit two blockers:

  • Cost. Appetize starts at $59/month and scales with team size.
  • Data. Both require uploading your app binary to external servers. For anything with sensitive business logic, that's a non-starter.

We already had Macs in the office. So we built tapflow instead.


Architecture

Browser (your team)  ←─ WebSocket ─→  Relay Server  ←─ WebSocket (outbound) ─→  Mac Agent
                                     (Linux / Mac)                           (iOS · Android)
Enter fullscreen mode Exit fullscreen mode

The Mac Agent connects outbound to the relay — no firewall or NAT configuration needed. The relay can run on a small Linux server (a ~$5/month Fly.io instance handles it). App data never leaves your infrastructure.


iOS touch — without WebDriverAgent

WebDriverAgent was the obvious starting point. We didn't use it.

The problems: WDA breaks on Xcode updates, requires provisioning profiles, needs the app to be in the foreground, and adds a layer of process management complexity we didn't want to own.

Instead, we load CoreSimulator.framework dynamically via dlopen in a Swift binary (touch-helper), then inject HID events directly through SimDeviceLegacyHIDClient and IndigoHID:

// touch-helper — HID event injection into the simulator
let client = SimDeviceLegacyHIDClient(device: device)
let event = IndigoHIDEvent.touch(x: x, y: y, phase: .began)
client.send(event)
Enter fullscreen mode Exit fullscreen mode

This bypasses WDA entirely. It works independently of the app lifecycle and doesn't break on Xcode updates.

The tradeoff: these are private APIs. They've been stable across Xcode versions in our testing, but Apple could remove them. We think that's a better bet than WDA's reliability track record.


iOS streaming — IOSurface

xcrun simctl io screenshot works, but the latency is too high for interactive use.

Instead, we access IOSurface directly through SimulatorKit, pulling frames straight from the simulator's GPU surface. Frames are JPEG-encoded on the Mac and streamed over WebSocket at ~30fps.

For slow clients, we drop frames rather than buffering — backpressure is handled at the WebSocket layer to prevent memory accumulation on the relay when a client can't keep up.


Android — scrcpy H.264 → WebGL

Android was cleaner. scrcpy already does the hard work of capturing the emulator display as an H.264 stream.

We receive the H.264 Annex B stream from scrcpy over a local TCP socket, relay it through WebSocket, then decode and render it in the browser using WebGL2.

scrcpy server (emulator)
    → TCP socket
    → Mac Agent
    → WebSocket
    → Browser (WebGL2)
Enter fullscreen mode Exit fullscreen mode

Pinch gestures

scrcpy's INJECT_TOUCH_EVENT supports multiple pointer IDs. Pinch is implemented by sending two simultaneous touch events:

// ScrcpyControl — multi-touch injection
pinchStart(x1: number, y1: number, x2: number, y2: number): void {
  this.touchDown(0, x1, y1)
  this.touchDown(1, x2, y2)
}
Enter fullscreen mode Exit fullscreen mode

What's included

Beyond streaming and input:

  • App Center — upload .app.zip (iOS) or .apk (Android), manage build status (Backlog / In Progress / Done / Rejected), REST API + Personal Access Tokens for CI/CD integration
  • Session recording — record and share QA sessions, retained for 72 hours
  • Team management — invite links, role-based access (Admin / Developer / QA / Viewer)
  • Mac resource monitoring — CPU and RAM time-series charts per agent

Honest limitations

  • iOS simulators require macOS — Apple's constraint, not ours
  • One Mac typically handles 2–4 simultaneous simulators depending on RAM; connect multiple Macs to pool devices
  • Still v0.x — breaking changes may appear before v1.0

Try it

tapflow is MIT licensed.

npm install -g tapflow
tapflow start
tapflow init  # create the first admin account
Enter fullscreen mode Exit fullscreen mode

For team deployments with a shared relay:

# Relay server (Linux/macOS)
JWT_SECRET=$(openssl rand -hex 32) tapflow relay start

# Each Mac agent
tapflow agent start --relay wss://your-relay-url
Enter fullscreen mode Exit fullscreen mode

Top comments (0)