My on-device QA used to be an AI tapping my app by pixel coordinates. One regression walk logged 83 raw coordinate taps. It was slow, it was flaky, and when it failed I could never tell if the app broke or the tap landed two pixels off a button.
This is the story of replacing that. The same six launch-blocker flows now run in about 90 seconds, with no model in the loop, plus a Jest guard that makes the whole thing unregressable. The app is Origo, an Expo SDK 56 (React Native 0.85) astrology app I am shipping to the Play Store.
Why the AI fell back to pixel coordinates
I was driving the Pixel 3 with an accessibility-tree tool. Point it at an element, it reads the tree, finds the node, taps the node. On a native app this is fast and stable, because native widgets populate the accessibility tree for free.
React Native does not. A <Pressable> shows up as an anonymous view unless you give it a testID. My app had nine testIDs across 164 files. So the tree-matching found almost nothing, and the tool degraded to its fallback: tap by screen percentage. That is the 83 taps. Every one is a guess at where a button rendered, and every screen size or layout tweak silently invalidates it.
The lesson is boring and load-bearing: an RN app is only as queryable as its testIDs. Fixing the QA speed was downstream of fixing that.
Step 1: testIDs on the flows that matter
I did not testID-everything. I picked the six flows that, if broken, block launch. Those are sign in, upgrade-to-pro opening the paywall, a PRO reading hitting the paywall, editing birth data, synastry (the relationship-compatibility screen) add-person refreshing the list, and sign out. Then I added stable testIDs to exactly the elements those flows touch. That took the count to 99 testIDs across 30 files, each one paying for itself.
Step 2: encode the flows in Maestro, self-contained
Maestro (2.6.0) runs YAML flows against a real device. The first version chained flows with clearState and back, and it was brittle: one flow would exit on the wrong screen and the next would start from a bad state.
The fix was to make every flow state-independent. Flow 1 clears state and tests the welcome to sign-in path itself. Flows 2 through 5 start with a plain launchApp. No state clear. On this app that preserves the signed-in Supabase session and drops you on the Today tab. So each flow re-navigates from a known state and never depends on the previous flow's exit screen.
# FLOW 2 - Upgrade-to-Pro opens the paywall (self-contained)
- launchApp # preserves session, lands on Today
- tapOn:
point: "90%,92%" # You tab
- tapOn:
id: "upgrade-cta"
- assertVisible: "Restore" # substring matches both paywall variants
That assertVisible: "Restore" is deliberate. Two different paywalls can render depending on whether RevenueCat's offering loaded at runtime, and "Restore" is a substring present in both. Assert the thing that is true in every valid state, not the exact string of one of them.
The trap: an on-device suite can break with zero CI signal
Here is the part that bit me and that I have not seen written down.
Those testIDs only take effect after a new build. And a Maestro failure only surfaces on-device, when someone bothers to run the suite. So if I rename upgrade-cta to upgrade-button during a refactor, nothing fails. Tests pass. CI is green. The regression suite is quietly dead, and I find out the next time I happen to plug in the phone.
A device suite that can rot silently is worse than no suite, because it tells you that you are covered when you are not.
Step 3: a Jest contract test makes the testIDs unregressable
So I pushed the guarantee down to the unit level, where it runs on every commit with no device. A plain Jest test asserts two things: every testID the Maestro suite depends on still exists in the source file that renders it, and the Maestro YAML references exactly that set and no other.
const TESTID_SOURCES: Record<string, string> = {
'onboarding-sign-in': 'app/onboarding/index.tsx',
'upgrade-cta': 'astro/screens/you/SubscriptionCard.tsx',
'paywall-purchase': 'astro/screens/onboarding/CustomPaywall.tsx',
// ... one entry per testID the suite drives
};
it.each(Object.entries(TESTID_SOURCES))(
'testID "%s" is declared in %s',
(testId, file) => {
const contents = readFileSync(srcPath(file), 'utf8');
expect(contents.includes(`testID="${testId}"`)).toBe(true);
}
);
it('the Maestro flow references only contract testIDs (no drift)', () => {
const flow = readFileSync('.maestro/origo-regression.yaml', 'utf8');
const referenced = [...flow.matchAll(/\bid:\s*"([^"]+)"/g)].map((m) => m[1]);
const contract = new Set(Object.keys(TESTID_SOURCES));
expect(referenced.filter((id) => !contract.has(id))).toEqual([]);
});
It checks the source text, not the rendered output, on purpose. It is the literal testID="..." string the device tooling matches, so I want to assert that exact string survives, independent of render internals or RN mocks. Rename a testID and forget the YAML, the contract test goes red on the next commit. Add a flow that needs a new testID, add the pair to the map and the guard enforces it forever.
Step 4: run the same flows against the release APK
The last gap: I was testing the dev client over Metro, not the artifact users install. So the runner got a second mode. SMOKE_TARGET=installed skips all the Metro and dev-launcher setup and drives a plain installed release APK. The flows' Metro-connect step is a conditional subflow, so it is a no-op there, and the exact same six flows validate the CI build.
EMAIL=... PASSWORD=... SMOKE_TARGET=installed \
./scripts/device-smoke.sh .maestro/origo-regression.yaml
A couple of gotchas worth saving you the time: Maestro's GraalJS runtime has no Thread.sleep, so timed waits go through a small busy-wait helper. And tab-bar taps use a screen percentage rather than a testID, because on the Pixel 3 the tab testID bounds bleed into the Android nav bar and the tap misses. That is the one place coordinates are correct, because the position is fixed and the elements are not individually addressable.
The takeaway
Speed was never the real problem. The real problem was determinism. Three changes fixed it: make the app queryable with testIDs on the flows that matter, encode those flows so they do not depend on each other, and then guard the fragile contract (testIDs that only exist after a build, failures that only show on-device) with a unit test that runs on every commit.
If you take one thing: an on-device test suite that can break without turning anything red is a liability. Find the part that can drift silently and pin it down a layer, where the cost of breaking it is a failed CI run, not a missed regression in production.
I am building Origo in the open. Next up: the paywall that renders two different ways depending on whether RevenueCat's offering loaded, and how I stopped guessing which one a user sees. Follow if that is your kind of problem.
Top comments (0)