<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Duchan</title>
    <description>The latest articles on DEV Community by Duchan (@joduchan).</description>
    <link>https://dev.to/joduchan</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3944002%2F0ff27416-cd10-41eb-b4fa-31490e25b7cc.png</url>
      <title>DEV Community: Duchan</title>
      <link>https://dev.to/joduchan</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/joduchan"/>
    <language>en</language>
    <item>
      <title>We switched simulator streaming to H.264 and it felt worse. Here's how we fixed the latency.</title>
      <dc:creator>Duchan</dc:creator>
      <pubDate>Wed, 10 Jun 2026 07:06:31 +0000</pubDate>
      <link>https://dev.to/joduchan/we-switched-simulator-streaming-to-h264-and-it-felt-worse-heres-how-we-fixed-the-latency-pk9</link>
      <guid>https://dev.to/joduchan/we-switched-simulator-streaming-to-h264-and-it-felt-worse-heres-how-we-fixed-the-latency-pk9</guid>
      <description>&lt;p&gt;In an earlier post I described how &lt;a href="https://github.com/jo-duchan/tapflow" rel="noopener noreferrer"&gt;tapflow&lt;/a&gt; streams iOS simulators to the browser: pull frames off the simulator's &lt;code&gt;IOSurface&lt;/code&gt;, JPEG-encode them on the Mac, push them over WebSocket at ~30fps.&lt;/p&gt;

&lt;p&gt;JPEG has one great property for interactive streaming: every frame is independent and decodes instantly. There's no buffer, no inter-frame dependency. On localhost it feels like you're touching the simulator directly.&lt;/p&gt;

&lt;p&gt;It also has one terrible property: size. A full-frame JPEG of a scrolling screen is ~590KB. On a LAN that's 12–16 MB/s, and our relay started dropping 16–27 frames a second under backpressure — visible tearing.&lt;/p&gt;

&lt;p&gt;So we did the obvious thing and moved to H.264. Bandwidth dropped roughly 140× on a still screen and 5× while scrolling. Drops nearly vanished.&lt;/p&gt;

&lt;p&gt;And the stream felt &lt;em&gt;worse&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;This post is about why, and the two fixes that got H.264 back to "feels like direct touch."&lt;/p&gt;




&lt;h2&gt;
  
  
  The bar: localhost JPEG
&lt;/h2&gt;

&lt;p&gt;Before touching anything I needed a number, not a vibe. So I instrumented the pipeline end to end — a per-stage panel that reports &lt;code&gt;decode→present&lt;/code&gt; and &lt;code&gt;glass→glass&lt;/code&gt; (capture timestamp to on-screen) latencies live.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;One caveat I'll repeat throughout: &lt;code&gt;glass→glass&lt;/code&gt; absolute values are only valid on localhost, where capture and display share one clock. &lt;code&gt;decode→present&lt;/code&gt; is a same-machine delta and valid anywhere, so I'll lean on it for the cross-environment claims.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Here's the baseline that mattered, measured on localhost:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Path&lt;/th&gt;
&lt;th&gt;decode→present p50/p95 (ms)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;JPEG still&lt;/td&gt;
&lt;td&gt;12.4 / 15.4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JPEG scroll&lt;/td&gt;
&lt;td&gt;9.4 / 11.6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;H.264 (WebCodecs) still&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;267 / 274&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;H.264 decode was &lt;strong&gt;~20× slower&lt;/strong&gt; than JPEG. On a hardware decoder. That made no sense — until I looked at what the decoder was actually doing.&lt;/p&gt;




&lt;h2&gt;
  
  
  Fix 1: the decoder was buffering 8 frames for no reason
&lt;/h2&gt;

&lt;p&gt;The transport was clean (~1ms), the input queue was empty. The latency was entirely inside the decoder: it was holding ~8 frames before emitting the first one.&lt;/p&gt;

&lt;p&gt;That's a DPB (decoded picture buffer). A decoder reorders frames when B-frames are present — it has to wait for future frames to arrive before it can output the current one in display order. So it buffers up to the level's maximum.&lt;/p&gt;

&lt;p&gt;But our encoder is &lt;strong&gt;baseline H.264, B-frames off&lt;/strong&gt;. There is no reordering. The actual reorder depth is zero. The decoder was buffering anyway because the bitstream never &lt;em&gt;told&lt;/em&gt; it the reorder depth was zero.&lt;/p&gt;

&lt;p&gt;The signal lives in the SPS (sequence parameter set), in the &lt;code&gt;bitstream_restriction&lt;/code&gt; flags inside VUI. Our VideoToolbox encoder wasn't setting them, so the decoder fell back to the worst case for the level — &lt;code&gt;max_dec_frame_buffering&lt;/code&gt; of ~8 frames at Level 5.0.&lt;/p&gt;

&lt;p&gt;The fix is to rewrite the SPS and inject the missing declaration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;max_num_reorder_frames&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;0&lt;/span&gt;
&lt;span class="py"&gt;max_dec_frame_buffering&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;num_ref_frames&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We do this in the agent, on the keyframe SPS, before the frame ever leaves the Mac — so &lt;em&gt;every&lt;/em&gt; decoder downstream benefits, not just one browser path:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// agent-core/utils/sps.ts — rewrite the SPS to declare zero reordering&lt;/span&gt;
&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;rewriteLowLatencySps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sps&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Uint8Array&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Uint8Array&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;bits&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;BitstreamWriter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;parseSps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sps&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
  &lt;span class="nx"&gt;bits&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vui&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;bitstreamRestriction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;bits&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vui&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;maxNumReorderFrames&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
  &lt;span class="nx"&gt;bits&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vui&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;maxDecFrameBuffering&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;bits&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;numRefFrames&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;serialize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;bits&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result on localhost:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Path&lt;/th&gt;
&lt;th&gt;decode→present p50/p95 (ms)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;H.264 WebCodecs still (before)&lt;/td&gt;
&lt;td&gt;267 / 274&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;H.264 WebCodecs still (after)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.5 / 4&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;H.264 WebCodecs scroll (after)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.1 / 3.9&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;code&gt;267 → 2.5ms&lt;/code&gt;, roughly 100×. The encoder was lying to the decoder by omission, and the decoder defended itself by buffering. One declaration fixed it.&lt;/p&gt;

&lt;p&gt;The browser confirms it's receiving the rewrite — the SPS now reports &lt;code&gt;bitstreamRestriction: true, maxNumReorderFrames: 0&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Fix 2: MSE is a buffer you can't turn off
&lt;/h2&gt;

&lt;p&gt;Fix 1 only helps the WebCodecs path. And WebCodecs has a hard constraint: it only runs in a secure context — HTTPS or localhost.&lt;/p&gt;

&lt;p&gt;A team using tapflow over their LAN hits it at plain &lt;code&gt;http://&amp;lt;mac-ip&amp;gt;:4000&lt;/code&gt;. That's a non-secure context, so the browser can't use WebCodecs. The fallback at the time was MSE (Media Source Extensions): feed the H.264 into a &lt;code&gt;&amp;lt;video&amp;gt;&lt;/code&gt; element through a muxer.&lt;/p&gt;

&lt;p&gt;The problem is that &lt;code&gt;&amp;lt;video&amp;gt;&lt;/code&gt; &lt;em&gt;is&lt;/em&gt; a buffer. It's designed for media playback, where a jitter buffer is a feature. For interactive streaming it's structural latency you can't remove. I measured it on localhost by forcing the MSE tier:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Path&lt;/th&gt;
&lt;th&gt;decode→present p50/p95 (ms)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;H.264 MSE still&lt;/td&gt;
&lt;td&gt;239 / 254&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;H.264 MSE scroll&lt;/td&gt;
&lt;td&gt;229 / 244&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;~235ms, on the &lt;em&gt;same&lt;/em&gt; &lt;code&gt;reorder=0&lt;/code&gt; stream that WebCodecs decoded in 2.5ms. The SPS fix can't reach this — it's the media-element buffer, not the decoder's DPB. I'd already set the muxer's &lt;code&gt;flushingTime&lt;/code&gt; to 0. There was nothing left to shave.&lt;/p&gt;

&lt;p&gt;So I stopped trying to make MSE fast and removed it.&lt;/p&gt;

&lt;p&gt;The decoder layer is now two tiers, picked automatically per environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// pickDecoder — secure → WebCodecs, otherwise WASM&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;pickDecoder&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nx"&gt;Decoder&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;isSecureContext&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;VideoDecoder&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;WebCodecsDecoder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;      &lt;span class="c1"&gt;// HW, lowest latency&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;webgl2Available&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;wasmSupported&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;WASMDecoder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;           &lt;span class="c1"&gt;// tinyh264, zero-buffer&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;                          &lt;span class="c1"&gt;// → fall back to JPEG&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On non-secure LAN-HTTP, we decode H.264 in WASM (tinyh264). It's a software decoder, so it costs CPU — but it has &lt;strong&gt;no media-element buffer at all&lt;/strong&gt;. That's the whole point: it gives you JPEG's immediacy with H.264's bandwidth, on plain HTTP.&lt;/p&gt;

&lt;p&gt;Measured on localhost (the worst case — encoder and decoder share one Mac):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Path&lt;/th&gt;
&lt;th&gt;decode→present p50/p95 (ms)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;H.264 WASM still&lt;/td&gt;
&lt;td&gt;8.7 / 30.4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;H.264 WASM scroll&lt;/td&gt;
&lt;td&gt;14.3 / 37.9&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That's on par with the localhost-JPEG baseline (12.4 / 9.4) — the bar we set at the start. Removing MSE also let us drop the muxer dependency entirely.&lt;/p&gt;

&lt;p&gt;One constraint this introduces: tinyh264 only decodes baseline H.264. iOS already encodes baseline. For Android we pin scrcpy to baseline (&lt;code&gt;profile:int=1&lt;/code&gt;) so both platforms share the exact same HTTP→WASM path. High profile is still available on the WebCodecs (secure) tier.&lt;/p&gt;




&lt;h2&gt;
  
  
  One more thing: dropping H.264 isn't like dropping JPEG
&lt;/h2&gt;

&lt;p&gt;There's a subtlety the switch exposed. With JPEG, every frame is a keyframe, so dropping a frame under backpressure is harmless — the next one stands alone. With H.264, if you drop a P-frame, every following P-frame references something the decoder never received. A zero-buffer decoder like WASM tinyh264 shears until the next IDR arrives.&lt;/p&gt;

&lt;p&gt;So the relay had to become keyframe-aware: once it starts dropping under backpressure, it drops the whole GOP until the next keyframe, rather than handing the decoder a broken reference chain. The keyframe flag rides in our frame envelope, so this needs zero NAL parsing on the relay.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// relay — once dropping, drop until the next keyframe&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;backpressured&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;frame&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;isKeyframe&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;       &lt;span class="c1"&gt;// skip P-frames in a broken GOP&lt;/span&gt;
  &lt;span class="nx"&gt;dropping&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;                    &lt;span class="c1"&gt;// keyframe resets the chain&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Honest limitations
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;WASM decode is CPU-bound.&lt;/strong&gt; At high resolution × fps it hits a CPU ceiling. We mitigate by downscaling the encode resolution — the display is small, so it's a triple win on bandwidth, CPU, and latency.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The localhost numbers are best-case for latency and worst-case for CPU.&lt;/strong&gt; On a real LAN the decoder runs on a separate machine. In our cross-machine measurements, scroll p95 climbs to ~50ms on &lt;em&gt;both&lt;/em&gt; decoders — at that point the bottleneck is load/transport, not the codec. The &lt;code&gt;decode→present&lt;/code&gt; deltas above hold; the &lt;code&gt;glass→glass&lt;/code&gt; absolutes do not transfer across two clocks.&lt;/li&gt;
&lt;li&gt;Still v0.x. The decoder tiers and SPS rewrite are in &lt;code&gt;agent-core&lt;/code&gt;; expect them to keep moving.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Takeaway
&lt;/h2&gt;

&lt;p&gt;Two bugs, same symptom ("H.264 feels laggy"), completely different causes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The decoder's DPB buffered 8 frames because the SPS didn't declare &lt;code&gt;reorder=0&lt;/code&gt;. Fix: rewrite the SPS at the encoder.&lt;/li&gt;
&lt;li&gt;The media-element buffer in MSE added ~235ms that no encoder flag can reach. Fix: remove MSE, decode in WASM on non-secure contexts.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The lesson I keep relearning: when streaming feels slow, measure each stage before you change the codec. The codec usually isn't the problem — the buffer you didn't know you had is.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;tapflow is MIT licensed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; tapflow
tapflow start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;🔗 GitHub: &lt;a href="https://github.com/jo-duchan/tapflow" rel="noopener noreferrer"&gt;https://github.com/jo-duchan/tapflow&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📖 Docs: &lt;a href="https://www.tapflow.dev" rel="noopener noreferrer"&gt;https://www.tapflow.dev&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>webdev</category>
      <category>ios</category>
      <category>android</category>
      <category>performance</category>
    </item>
    <item>
      <title>Giving an LLM Eyes and Hands on a Mobile Simulator</title>
      <dc:creator>Duchan</dc:creator>
      <pubDate>Sat, 30 May 2026 08:23:03 +0000</pubDate>
      <link>https://dev.to/joduchan/-giving-an-llm-eyes-and-hands-on-a-mobile-simulator-5963</link>
      <guid>https://dev.to/joduchan/-giving-an-llm-eyes-and-hands-on-a-mobile-simulator-5963</guid>
      <description>&lt;p&gt;Mobile QA has a scaling problem.&lt;/p&gt;

&lt;p&gt;Unit tests and API tests run in CI automatically. But the thing that actually matters to most users — does tapping this button do the right thing, does this screen look right after this flow, does the deeplink open the correct state — none of that runs automatically. Someone has to open the simulator, walk through the steps, and verify. Every time.&lt;/p&gt;

&lt;p&gt;The usual answer is Appium or XCUITest. But those require engineers to write and maintain test code that mirrors the UI, breaks whenever the screen changes, and only runs against builds developers already have locally.&lt;/p&gt;

&lt;p&gt;We had a different idea. tapflow already lets humans control a simulator through a browser. What if we gave an LLM the same interface?&lt;/p&gt;




&lt;h2&gt;
  
  
  The interface a human uses
&lt;/h2&gt;

&lt;p&gt;When a person does QA in tapflow, the loop is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Look at the simulator screen&lt;/li&gt;
&lt;li&gt;Decide what to do (tap, swipe, type)&lt;/li&gt;
&lt;li&gt;Do it&lt;/li&gt;
&lt;li&gt;Look again&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is exactly the perception-action loop that vision-capable LLMs are built for. The model sees a screenshot, reasons about what it shows, decides what action to take, and calls a tool to execute it.&lt;/p&gt;

&lt;p&gt;We didn't need to build a new automation layer. We just needed to expose tapflow's existing WebSocket and REST APIs as MCP tools.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the MCP server does
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;@tapflowio/mcp-server&lt;/code&gt; connects to a running tapflow relay and registers 13 tools that any MCP-compatible client can call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;list_devices       — see all simulators registered on the relay
connect_device     — join a device session
boot_device        — boot a simulator (waits up to 30s for ready state)
screenshot         — capture the current screen
tap                — tap at a pixel coordinate
swipe              — swipe between two coordinates
type_text          — type into the focused field
press_key          — press a keyboard key (Return, Delete, Escape...)
press_button       — press a hardware button (home, lock)
install_app        — install a build from App Center
launch_app         — launch an installed app
list_builds        — list available builds on the relay
disconnect_device  — end the session
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Setup is two environment variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;TAPFLOW_RELAY_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;wss://your-relay-url
&lt;span class="nv"&gt;TAPFLOW_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your-pat-token
npx @tapflowio/mcp-server
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add it as an MCP server in your client config, and those tools appear in the model's tool list.&lt;/p&gt;




&lt;h2&gt;
  
  
  How the tools are implemented
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Screenshot — the model's eyes
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;screenshot&lt;/code&gt; tool calls the REST endpoint we added in v0.3.0 (&lt;code&gt;GET /api/v1/sessions/:id/screenshot&lt;/code&gt;), gets back a PNG or JPEG buffer, base64-encodes it, and returns it as MCP &lt;code&gt;image&lt;/code&gt; content alongside the pixel dimensions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;image&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;base64&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nx"&gt;mimeType&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Screenshot saved: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; (&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;width&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;×&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;height&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;px)`&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model receives the actual image. It can read text on screen, identify UI elements, notice error states — the same things a human would.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tap and swipe — normalized coordinates
&lt;/h3&gt;

&lt;p&gt;Here's the part that took a few iterations to get right. The simulator's logical coordinate space is different from screenshot pixel coordinates, and it changes with screen resolution, device type, and scale factor.&lt;/p&gt;

&lt;p&gt;Rather than exposing logical coordinates (which the model can't reason about without device-specific knowledge), we have the model work entirely in screenshot pixel space. The &lt;code&gt;tap&lt;/code&gt; tool takes pixel coordinates plus the screenshot dimensions, then normalizes internally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// tools.ts&lt;/span&gt;
&lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sessionId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;x&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;screenshotWidth&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;y&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;screenshotHeight&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model calls &lt;code&gt;screenshot&lt;/code&gt; first, reads the dimensions from the response, then uses those same dimensions when calling &lt;code&gt;tap&lt;/code&gt;. This means the model can identify "the button is at roughly pixel 200, 450" from the image and tap it directly — no coordinate system translation required.&lt;/p&gt;

&lt;p&gt;Swipe works the same way, with 8 interpolated &lt;code&gt;touch:move&lt;/code&gt; events across the duration to simulate a natural gesture:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// client.ts — swipe interpolation&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;STEPS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;interval&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;durationMs&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;STEPS&lt;/span&gt;

&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;input:touch:start&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;sessionId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;startX&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;y&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;startY&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;STEPS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;delay&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;interval&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;t&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;STEPS&lt;/span&gt;
  &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;input:touch:move&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;sessionId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;startX&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;endX&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;startX&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="na"&gt;y&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;startY&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;endY&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;startY&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Async operations over WebSocket
&lt;/h3&gt;

&lt;p&gt;Several tools involve async operations — booting a device, installing an app — where the relay sends a confirmation back over WebSocket after the operation completes.&lt;/p&gt;

&lt;p&gt;The client uses a &lt;code&gt;waitFor&lt;/code&gt; pattern: register a predicate against incoming messages, return a promise that resolves when a matching message arrives, and reject if a timeout fires first.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// client.ts — waitFor&lt;/span&gt;
&lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="nf"&gt;waitFor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;predicate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;timeoutMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;RelayMsg&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;reject&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;timer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;setTimeout&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;waiters&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;splice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;waiters&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findIndex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;w&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;resolve&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nx"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="nf"&gt;reject&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Request timed out&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="nx"&gt;timeoutMs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;waiters&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;predicate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;reject&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;timer&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;boot_device&lt;/code&gt; waits up to 30 seconds. &lt;code&gt;install_app&lt;/code&gt; waits 60 seconds. Each resolves on the confirmation message or rejects with the error payload.&lt;/p&gt;




&lt;h2&gt;
  
  
  What a session looks like
&lt;/h2&gt;

&lt;p&gt;A model running a login flow might do this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. list_devices → pick a session
2. connect_device
3. list_builds → find the build to test
4. boot_device
5. install_app
6. launch_app
7. screenshot → see the login screen
8. tap(email field coordinates) → focus the input
9. type_text("test@example.com")
10. tap(password field coordinates)
11. type_text("password")
12. tap(login button coordinates)
13. screenshot → verify the home screen loaded
14. disconnect_device
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each screenshot gives the model a chance to verify state before proceeding. If step 13 shows an error message instead of the home screen, the model knows something went wrong.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where we are: experimental
&lt;/h2&gt;

&lt;p&gt;The version says &lt;code&gt;0.3.1-experimental.1&lt;/code&gt; for a reason. The tools work, but the layer needs more hardening before we'd call it reliable.&lt;/p&gt;

&lt;p&gt;The core issue is consistency. The same sequence of tool calls should produce predictable behavior every time. Right now it doesn't always — there are timing edge cases where an action fires before the UI has fully settled, device state can drift between steps without the model noticing, and error recovery when something unexpected happens mid-flow is rough.&lt;/p&gt;

&lt;p&gt;These are solvable problems, but we want to solve them before presenting this as something teams should build pipelines on.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where we're going: CI/CD without a QA script
&lt;/h2&gt;

&lt;p&gt;The direction we're aiming at is using the MCP server as the foundation for LLM-driven smoke tests in CI.&lt;/p&gt;

&lt;p&gt;The scenario: a new build passes unit tests and gets uploaded to App Center. A CI step spins up the MCP server, points it at the relay, and gives a model a natural-language test spec:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Install the latest build. Log in with test credentials. Navigate to the cart, add an item, and confirm the checkout screen shows the correct total. Take a screenshot at each step."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The model does the steps, captures evidence, and reports what it saw. No automation code to write. No selectors to maintain when the UI changes. The spec is just a description of what a human would do.&lt;/p&gt;

&lt;p&gt;This isn't production-ready yet. The stability work comes first. But the pieces — browser-controllable simulators, screenshot REST endpoint, MCP tool layer — are in place. The question is whether the model can run a flow reliably enough to be trusted in CI without a human verifying each run.&lt;/p&gt;

&lt;p&gt;We think it can. That's what we're building toward.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try the MCP server (experimental)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @tapflowio/mcp-server@experimental
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll need a running tapflow relay and a PAT token with viewer scope. Configure it in your MCP client:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"tapflow"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"@tapflowio/mcp-server"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"env"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"TAPFLOW_RELAY_URL"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"wss://your-relay-url"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"TAPFLOW_TOKEN"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"your-pat-token"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you try it and hit rough edges, open an issue — that feedback is exactly what's shaping the stability work.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🔗 GitHub: &lt;a href="https://github.com/jo-duchan/tapflow" rel="noopener noreferrer"&gt;https://github.com/jo-duchan/tapflow&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📖 Docs: &lt;a href="https://www.tapflow.dev/guide/mcp-server" rel="noopener noreferrer"&gt;https://www.tapflow.dev/guide/mcp-server&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>opensource</category>
      <category>ios</category>
      <category>android</category>
      <category>mcp</category>
    </item>
    <item>
      <title>tapflow v0.3.x: Deeplinks, Keyboard Shortcuts, Screenshot API, and an Experimental MCP Server</title>
      <dc:creator>Duchan</dc:creator>
      <pubDate>Fri, 29 May 2026 07:38:22 +0000</pubDate>
      <link>https://dev.to/joduchan/tapflow-v03x-deeplinks-keyboard-shortcuts-screenshot-api-and-an-experimental-mcp-server-4lg1</link>
      <guid>https://dev.to/joduchan/tapflow-v03x-deeplinks-keyboard-shortcuts-screenshot-api-and-an-experimental-mcp-server-4lg1</guid>
      <description>&lt;p&gt;tapflow started as a simple idea: stream iOS simulators and Android emulators to the browser so anyone on the team can do mobile QA without touching Xcode or Android Studio. v0.2.x got the core working — streaming, touch input, App Center, session recording.&lt;/p&gt;

&lt;p&gt;v0.3.x is about filling in the gaps that matter during actual QA sessions. This post covers what shipped and ends with something we're still figuring out: an experimental MCP server that lets LLM agents control simulators directly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Deeplink execution from the browser
&lt;/h2&gt;

&lt;p&gt;The one that came up most in real usage: testers frequently need to trigger deeplinks to verify specific app states — product detail pages, notification payloads, OAuth redirects. The old workflow always involved a mobile developer — either having them trigger it on their machine or building a debug menu inside the app specifically for this purpose.&lt;/p&gt;

&lt;p&gt;In v0.3.0 you can now fire a deeplink directly from the QA session toolbar. Click the link icon (or &lt;code&gt;⌘K&lt;/code&gt;), enter the URL, and it executes on the active device.&lt;/p&gt;

&lt;p&gt;Under the hood it's a new &lt;code&gt;open-url&lt;/code&gt; WebSocket message type that routes browser → relay → agent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Browser ──open-url──► Relay ──open-url──► Mac Agent
                                              │
                           iOS: xcrun simctl openurl booted &amp;lt;url&amp;gt;
                           Android: adb shell am start -a VIEW -d &amp;lt;url&amp;gt;
Browser ◄──open-url:done/error── Relay ◄──────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;DeviceAgent&lt;/code&gt; interface got a new &lt;code&gt;openUrl(url)&lt;/code&gt; method, so both iOS and Android agents implement it symmetrically. The relay routes it and returns either &lt;code&gt;open-url:done&lt;/code&gt; or &lt;code&gt;open-url:error&lt;/code&gt; with the failure reason. The dashboard shows a toast either way.&lt;/p&gt;




&lt;h2&gt;
  
  
  Keyboard shortcuts for simulator controls
&lt;/h2&gt;

&lt;p&gt;QA sessions are repetitive. Reaching for the toolbar icons on every screenshot or rotation adds up. v0.3.0 adds keyboard shortcuts to all the common actions:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Shortcut&lt;/th&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;⌘K&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Open deeplink dialog&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;⌘S&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Take screenshot&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;⌘⇧Y&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Start / stop recording&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;⌘⇧O&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Rotate simulator&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;⌘⇧U&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;iOS: press Home&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;⌘⇧K&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;iOS: toggle software keyboard&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Tooltips now show the shortcut hint inline, so they're discoverable without reading docs. One implementation detail worth noting: key detection uses &lt;code&gt;e.code&lt;/code&gt; instead of &lt;code&gt;e.key&lt;/code&gt;. This matters for IME input — Korean, Japanese, and Chinese users composing text would otherwise trigger shortcuts mid-composition.&lt;/p&gt;




&lt;h2&gt;
  
  
  Screenshot REST endpoint
&lt;/h2&gt;

&lt;p&gt;This one unlocks a new class of CI usage.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;GET /api/v1/sessions/:sessionId/screenshot&lt;/code&gt; returns a PNG or JPEG of the current simulator screen. You can call it with a PAT token from any CI step — before asserting a visual state, during an automated flow, after a build install.&lt;/p&gt;

&lt;p&gt;The tricky part was the request/response pattern. The relay communicates with agents over WebSocket (long-lived, multiplexed), but HTTP is request/response. Screenshots are taken on the Mac, not the relay.&lt;/p&gt;

&lt;p&gt;We introduced a requestId-based pending map: the relay generates a unique ID, sends a &lt;code&gt;take-screenshot&lt;/code&gt; message to the agent over WebSocket, registers a promise keyed by requestId, and resolves it when &lt;code&gt;screenshot:result&lt;/code&gt; comes back. The HTTP handler awaits that promise and sends the binary payload:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GET /api/v1/sessions/:id/screenshot
    │
    ▼
Relay: generate requestId, push to pending map
    │
    ├──screenshot-request──► Mac Agent
    │                            │ simctl io screenshot (iOS)
    │                            │ ADB screencap (Android)
    ◄──screenshot:result─────────┘
    │
    ▼
HTTP 200 (binary image)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;iOS supports both PNG and JPEG via &lt;code&gt;--type&lt;/code&gt;. Android returns PNG regardless — ADB doesn't offer format selection at this layer.&lt;/p&gt;




&lt;h2&gt;
  
  
  PAT scope enforcement
&lt;/h2&gt;

&lt;p&gt;Personal Access Tokens existed before v0.3.0, but the scope field wasn't actually enforced on API routes. A &lt;code&gt;developer&lt;/code&gt; scoped token could call any endpoint.&lt;/p&gt;

&lt;p&gt;v0.3.0 adds proper scope checks to all builds endpoints. PATs are now enforced at the middleware layer: a token issued for &lt;code&gt;builds&lt;/code&gt; access can upload and manage builds, but can't touch team settings or session data. This makes it safe to issue narrow tokens for CI pipelines without giving them broader access than they need.&lt;/p&gt;




&lt;h2&gt;
  
  
  Frame performance instrumentation
&lt;/h2&gt;

&lt;p&gt;For anyone debugging streaming latency: v0.3.x adds per-frame hop timestamps via a binary header (&lt;code&gt;TFFE&lt;/code&gt; — tapflow frame envelope). Each frame now carries the capture time, relay-received time, and client-received time in an 8-byte prefix before the JPEG/H.264 payload.&lt;/p&gt;

&lt;p&gt;The dashboard can surface a live performance overlay showing frame latency broken down by segment (agent → relay, relay → browser). Useful when diagnosing whether a slowdown is in the network leg or the browser decode path.&lt;/p&gt;




&lt;h2&gt;
  
  
  Experimental: an MCP server
&lt;/h2&gt;

&lt;p&gt;v0.3.x also ships &lt;code&gt;@tapflowio/mcp-server&lt;/code&gt; (&lt;code&gt;0.3.1-experimental.1&lt;/code&gt;) — it exposes tapflow's WebSocket/REST APIs as MCP tools so an LLM agent can drive a simulator the same way a human does in the browser: screenshot → reason → tap/type → screenshot again.&lt;/p&gt;

&lt;p&gt;It's early (the &lt;code&gt;experimental&lt;/code&gt; suffix is literal — consistency and error-recovery still need work), and it's a big enough topic to have its own write-up: &lt;strong&gt;&lt;a href="https://dev.to/joduchan/-giving-an-llm-eyes-and-hands-on-a-mobile-simulator-5963"&gt;Giving an LLM Eyes and Hands on a Mobile Simulator&lt;/a&gt;&lt;/strong&gt; covers the full tool list, the normalized-coordinate tap/swipe, and where this is headed (LLM-driven smoke tests in CI).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @tapflowio/mcp-server@experimental
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; tapflow
tapflow start
&lt;span class="c"&gt;# http://localhost:4000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;🔗 GitHub: &lt;a href="https://github.com/jo-duchan/tapflow" rel="noopener noreferrer"&gt;https://github.com/jo-duchan/tapflow&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📖 Docs: &lt;a href="https://www.tapflow.dev" rel="noopener noreferrer"&gt;https://www.tapflow.dev&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>opensource</category>
      <category>ios</category>
      <category>android</category>
      <category>reactnative</category>
    </item>
    <item>
      <title>Your whole team can now run mobile QA from the browser. Here's how we built it.</title>
      <dc:creator>Duchan</dc:creator>
      <pubDate>Wed, 27 May 2026 05:08:18 +0000</pubDate>
      <link>https://dev.to/joduchan/your-whole-team-can-now-run-mobile-qa-from-the-browser-heres-how-we-built-it-3fmn</link>
      <guid>https://dev.to/joduchan/your-whole-team-can-now-run-mobile-qa-from-the-browser-heres-how-we-built-it-3fmn</guid>
      <description>&lt;p&gt;If you work on a mobile product, you've probably seen this.&lt;/p&gt;

&lt;p&gt;Physical devices are never enough. Covering every OS version is even harder — iOS doesn't support downgrading, so maintaining a range of versions means managing a pool of locked devices, which is overhead nobody wants.&lt;/p&gt;

&lt;p&gt;But the bigger friction is access. Simulators only run on a developer's Mac, behind complex toolchains. Anyone on the team who isn't a mobile developer has to ask one every single time they need to verify something:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Server / FE developer&lt;/strong&gt; — "How do I install the sandbox build to check what was deployed?"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Product manager&lt;/strong&gt; — "I keep having to install and remove different versions just to compare behavior."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Designer&lt;/strong&gt; — "I need to check the layout across screen sizes, but I don't have the right devices."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Cloud simulator services exist. But uploading internal app builds to an external service — and paying monthly fees for simulators already running on Macs you own — was never something we wanted to do.&lt;/p&gt;

&lt;p&gt;So we built &lt;a href="https://github.com/jo-duchan/tapflow" rel="noopener noreferrer"&gt;tapflow&lt;/a&gt;: an open-source, self-hosted tool that streams iOS simulators and Android emulators to the browser. Anyone on your team opens the dashboard, picks a device, and starts interacting — no Xcode, no Android Studio, no setup.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; tapflow
tapflow start
&lt;span class="c"&gt;# → http://localhost:4000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This post is about how we built it — specifically the parts that weren't obvious.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why we didn't just use Appetize or BrowserStack
&lt;/h2&gt;

&lt;p&gt;Both services solve the browser access problem. We evaluated them seriously. Before signing up, we hit two blockers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cost.&lt;/strong&gt; Appetize starts at $59/month and scales with team size.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data.&lt;/strong&gt; Both require uploading your app binary to external servers. For anything with sensitive business logic, that's a non-starter.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We already had Macs in the office. So we built tapflow instead.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Browser (your team)  ←─ WebSocket ─→  Relay Server  ←─ WebSocket (outbound) ─→  Mac Agent
                                     (Linux / Mac)                           (iOS · Android)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Mac Agent connects &lt;strong&gt;outbound&lt;/strong&gt; to the relay — no firewall or NAT configuration needed. The relay can run on a small Linux server (a ~$5/month Fly.io instance handles it). App data never leaves your infrastructure.&lt;/p&gt;




&lt;h2&gt;
  
  
  iOS touch — without WebDriverAgent
&lt;/h2&gt;

&lt;p&gt;WebDriverAgent was the obvious starting point. We didn't use it.&lt;/p&gt;

&lt;p&gt;The problems: WDA breaks on Xcode updates, requires provisioning profiles, needs the app to be in the foreground, and adds a layer of process management complexity we didn't want to own.&lt;/p&gt;

&lt;p&gt;Instead, we load &lt;code&gt;CoreSimulator.framework&lt;/code&gt; dynamically via &lt;code&gt;dlopen&lt;/code&gt; in a Swift binary (&lt;code&gt;touch-helper&lt;/code&gt;), then inject HID events directly through &lt;code&gt;SimDeviceLegacyHIDClient&lt;/code&gt; and &lt;code&gt;IndigoHID&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight swift"&gt;&lt;code&gt;&lt;span class="c1"&gt;// touch-helper — HID event injection into the simulator&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;SimDeviceLegacyHIDClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;device&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;IndigoHIDEvent&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;touch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;y&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;phase&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;began&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This bypasses WDA entirely. It works independently of the app lifecycle and doesn't break on Xcode updates.&lt;/p&gt;

&lt;p&gt;The tradeoff: these are private APIs. They've been stable across Xcode versions in our testing, but Apple could remove them. We think that's a better bet than WDA's reliability track record.&lt;/p&gt;




&lt;h2&gt;
  
  
  iOS streaming — IOSurface
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;xcrun simctl io screenshot&lt;/code&gt; works, but the latency is too high for interactive use.&lt;/p&gt;

&lt;p&gt;Instead, we access &lt;code&gt;IOSurface&lt;/code&gt; directly through SimulatorKit, pulling frames straight from the simulator's GPU surface. &lt;del&gt;Frames are JPEG-encoded on the Mac and streamed over WebSocket at ~30fps.&lt;/del&gt;&lt;/p&gt;

&lt;p&gt;For slow clients, we drop frames rather than buffering — backpressure is handled at the WebSocket layer to prevent memory accumulation on the relay when a client can't keep up.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt; JPEG was the first version. The default is now H.264 with a buffer-free 2-tier browser decoder (WebCodecs on secure contexts, WASM on plain HTTP). The full teardown — why H.264 first felt &lt;em&gt;worse&lt;/em&gt;, and the two fixes that solved it — is a separate post: &lt;a href="https://dev.to/joduchan/we-switched-simulator-streaming-to-h264-and-it-felt-worse-heres-how-we-fixed-the-latency-pk9"&gt;We switched simulator streaming to H.264 and it felt worse&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Android — scrcpy H.264 → WebGL
&lt;/h2&gt;

&lt;p&gt;Android was cleaner. scrcpy already does the hard work of capturing the emulator display as an H.264 stream.&lt;/p&gt;

&lt;p&gt;We receive the H.264 Annex B stream from scrcpy over a local TCP socket, relay it through WebSocket, then decode and render it in the browser. Android now shares the same buffer-free 2-tier decoder as iOS (see the update above).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;scrcpy server (emulator)
    → TCP socket
    → Mac Agent
    → WebSocket
    → Browser (WebGL2)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pinch gestures
&lt;/h3&gt;

&lt;p&gt;scrcpy's &lt;code&gt;INJECT_TOUCH_EVENT&lt;/code&gt; supports multiple pointer IDs. Pinch is implemented by sending two simultaneous touch events:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// ScrcpyControl — multi-touch injection&lt;/span&gt;
&lt;span class="nf"&gt;pinchStart&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;x1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;y1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;x2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;y2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;touchDown&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;x1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;y1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;touchDown&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;x2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;y2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What's included
&lt;/h2&gt;

&lt;p&gt;Beyond streaming and input:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;App Center&lt;/strong&gt; — upload &lt;code&gt;.app.zip&lt;/code&gt; (iOS) or &lt;code&gt;.apk&lt;/code&gt; (Android), manage build status (Backlog / In Progress / Done / Rejected), REST API + Personal Access Tokens for CI/CD integration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Session recording&lt;/strong&gt; — record and share QA sessions, kept for ~72 hours before automatic cleanup&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Team management&lt;/strong&gt; — invite links, role-based access (Admin / Developer / QA / Viewer)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mac resource monitoring&lt;/strong&gt; — CPU and RAM time-series charts per agent&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Honest limitations
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;iOS simulators require macOS — Apple's constraint, not ours&lt;/li&gt;
&lt;li&gt;One Mac typically handles 2–4 simultaneous simulators depending on RAM; connect multiple Macs to pool devices&lt;/li&gt;
&lt;li&gt;Still v0.x — breaking changes may appear before v1.0&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;tapflow is MIT licensed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; tapflow
tapflow start
tapflow init  &lt;span class="c"&gt;# create the first admin account&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For team deployments with a shared relay:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Relay server (Linux/macOS)&lt;/span&gt;
&lt;span class="nv"&gt;JWT_SECRET&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;openssl rand &lt;span class="nt"&gt;-hex&lt;/span&gt; 32&lt;span class="si"&gt;)&lt;/span&gt; tapflow relay start

&lt;span class="c"&gt;# Each Mac agent&lt;/span&gt;
tapflow agent start &lt;span class="nt"&gt;--relay&lt;/span&gt; wss://your-relay-url
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;🔗 GitHub: &lt;a href="https://github.com/jo-duchan/tapflow" rel="noopener noreferrer"&gt;https://github.com/jo-duchan/tapflow&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📖 Docs: &lt;a href="https://www.tapflow.dev" rel="noopener noreferrer"&gt;https://www.tapflow.dev&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>opensource</category>
      <category>ios</category>
      <category>android</category>
      <category>reactnative</category>
    </item>
  </channel>
</rss>
