Low-Budget Multi-Device QA: Automating 3 Platforms with Open Source Tools
Practical automation patterns for health apps across Android APK, WeChat Mini Program, and Web backend — using only open source tools and the hardware you already have.
The Problem
You have a medical app that ships on three surfaces:
- Android APK — the doctor's side, a uni-app WebView wrapper
- WeChat Mini Program — the patient's side, running inside WeChat's sandbox
- Web Backend — admin panel, Vue3 + Element Plus
You have two test phones: an Oppo PCKM00 and a Huawei ANA-AN00. Your budget for test infrastructure: zero. No BrowserStack, no Sauce Labs, no paid SaaS.
Oh, and the APK is a WebView wrapper — the app's core UI lives inside a WebView that's invisible to Android's UI dump (uiautomator2 can't see it). And WeChat's mini-program runtime intercepts standard automation primitives. And the two phones have different screen resolutions and keyboard heights. And you don't have sudo on the CI machine.
This is the problem deep-test was built to solve. Here's the playbook.
Architecture Overview
┌─────────────────────────────────────────────┐
│ deep-test (Hermes Agent) │
├─────────────────────────────────────────────┤
│ core/ │
│ ├── device.py → device registry + ADB │
│ ├── coords.py → multi-device scaling │
│ ├── locator.py → 3-layer self-healing │
│ ├── ocr.py → rapidocr wrapper │
│ ├── runner.py → retry + LLM fallback │
│ └── web-runner.cjs → Playwright + Vue3 fix │
├─────────────────────────────────────────────┤
│ projects/med-app/ │
│ ├── android/ → login, patient, chat │
│ ├── miniprogram/ → mini-program flows │
│ ├── web/ → admin panel (Playwright) │
│ └── scenarios/ → cross-platform orchestration│
├─────────────────────────────────────────────┤
│ reports/ (HTML + screenshots) │
└─────────────────────────────────────────────┘
Hardware cost: $0. Every tool is open source. The phones are existing hardware. The LLM fallback uses DeepSeek V4 API (pay-as-you-go, roughly a few dollars per month).
Pattern 1: The 3-Layer Self-Healing Locator
HTML dumps can't see WebView content. Pure coordinates break across devices. The solution: a cascade of three fallback strategies.
def locate(element_id, serial, device_alias):
"""Try each strategy in order. Fail fast, retry smart."""
# Layer 1: uiautomator2 XML (fastest, works for native elements)
try:
return u2_session(serial).resourceId(element_id).bounds
except:
pass # Element is in WebView — not in XML
# Layer 2: Coordinate map (device-aware, cached)
try:
return Coords[device_alias][element_id]
except KeyError:
pass # Unknown element — need OCR
# Layer 3: OCR + LLM fallback (slowest but most resilient)
screenshot = take_screenshot(serial)
ocr_result = ocr(screenshot)
# LLM reads the screenshot, returns the action + coordinates
response = llm.ask(
f"Screen shows: {ocr_result}. Find '{element_id}' and return its center coordinates."
)
return parse_coords(response)
What this solves:
- Coord-only tests work on Oppo but break on Huawei (different screen dimensions)
- uiautomator2 can't reach WebView content inside the uni-app shell
- OCR is slow but catches everything — acts as the safety net
Real-world numbers: Layer 1 handles ~30% of locators (native login buttons). Layer 2 handles ~50% (known UI elements in the mini-program). Layer 3 catches the remaining ~20% (dynamic content, confirmation dialogs). Average locate time with Layer 1: 200ms. Layer 3: 2-4 seconds.
Pattern 2: The Keyboard Nightmare
This single bug ate more debug time than any other issue.
The Huawei ANA-AN00's stock IME doesn't play nicely with adb shell input text. The keyboard overlays the password field, and after typing, the "Login" button is hidden behind the keyboard.
The two devices have different keyboard heights — the Huawei IME panel is ~310px, roughly 100px taller than the Oppo's ~210px.
The fix sequence:
def type_and_submit(serial, text):
# Step 1: Type text with chained commands (anti-IME swallowing)
cmd = " && ".join(
f"shell input text {ch} && sleep 0.08"
for ch in text
)
subprocess.run(["adb", "-s", serial, cmd], timeout=60, shell=True)
# Step 2: Dismiss keyboard (CRITICAL)
subprocess.run([
"adb", "-s", serial,
"shell", "input keyevent KEYCODE_BACK"
], timeout=5)
time.sleep(2)
# Step 3: Now the button is visible — click it
coords = Coords.scale_y(device_alias, "login_button")
subprocess.run([
"adb", "-s", serial,
"shell", f"input tap {coords.x} {coords.y}"
])
Key insight: KEYCODE_BACK dismisses the keyboard without leaving the form. A second press would exit the activity — one press is the sweet spot.
Why not use uiautomator2(text="登录").click()? Because when the keyboard is up, it intercepts the click target. The tap lands on the keyboard overlay, not the button.
Pattern 3: Defeating the IME Input Hog
Both Baidu IME (Oppo) and Sogou IME (Huawei) have a nasty behavior: they swallow individual adb shell input text commands that arrive too fast.
Wrong approach (will lose characters):
for ch in id_number:
adb_cmd(serial, f"shell input text {ch}")
The stock IME on Oppo drops ~1 in every 3 characters this way. The 18th digit of an ID number is almost always missing.
Right approach (chained with sleep):
cmd = " && ".join(
f"shell input text {ch} && sleep 0.08"
for ch in id_number
)
adb_cmd(serial, cmd)
Each character gets 80ms of settling time. The entire 18-digit ID takes ~1.5s. Tested across 50+ runs: zero lost characters.
Pattern 4: Cross-Device Coordinate Scaling
The Oppo is 1080×2400. The Huawei is 1080×2340. Every Y coordinate needs to be scaled.
class Coords:
BASE_DEVICE = "oppo" # All coordinates recorded here
REFERENCE_HEIGHT = 2400
@staticmethod
def scale_y(device_alias, element_key):
"""Scale Y coordinate from reference device to target device."""
base_y = COORD_MAP[element_key][1]
target_height = DEVICE_REGISTRY[device_alias]["height"]
scale_factor = target_height / Coords.REFERENCE_HEIGHT
return int(base_y * scale_factor)
With this, every interactable element has exactly one coordinate entry (recorded on Oppo), and all other devices auto-scale. Adding a Huawei Mate 60 or a Xiaomi 14 is a one-line config change.
Pattern 5: Playwright × Vue3 — The Synthetic Event Trap
Vue 3 doesn't respond to Playwright's synthetic click events. The framework dispatches a PointerEvent but Vue's internal vnode listener doesn't pick it up.
Doesn't work:
await page.click('.el-button--primary');
Works:
await page.evaluate(() => {
document.querySelector('.el-button--primary').click();
});
Why? Playwright's synthetic events use CDP (Chrome DevTools Protocol) input dispatch, which bypasses Vue's event delegation layer in certain configurations. element.click() fires the native click handler directly, which Vue's runtime picks up correctly.
Rule of thumb: If Playwright clicks land silently (no error, no action), wrap them in page.evaluate().
Pattern 6: The OCR-Based Dynamic Button Locator
When a UI element moves based on previous actions (e.g., "Add Patient" button scrolls down as more patients are added), coordinates become unreliable. OCR is the solution.
def find_button_y(serial, button_text, max_scrolls=3):
"""Scroll down until the button text appears, return its Y."""
for attempt in range(max_scrolls):
texts = take_ocr(serial, f"find_{button_text}")
for text_bbox in texts:
if button_text in text_bbox.text:
return text_bbox.center_y
# Not found — scroll down
subprocess.run([
"adb", "-s", serial,
"shell", "input swipe 540 1500 540 500 500"
], timeout=10)
time.sleep(1.5)
raise LocateError(f"'{button_text}' not found after {max_scrolls} scrolls")
This replaced a brittle coordinate system where the "Save" button Y shifted by ~48px per patient added. After 9 patients, it scrolled off-screen entirely.
Pattern 7: The LLM Self-Healing Loop
When a test fails despite all the above layers, the system doesn't crash — it invokes the LLM.
Test Fails (e.g., Element 'start_consultation' not found)
│
├─ Layer 1 Retry (×2): Re-query uiautomator2 with longer wait
│ └─ Still failing? →
├─ Layer 2 Retry (×2): Refresh OCR with different threshold
│ └─ Still failing? →
└─ Layer 3: LLM Diagnosis
├─ Screenshot + error → LLM analyzes the screen
├─ LLM suggests: "A confirmation dialog 'Are you sure?' is blocking
│ the button. Click coordinate (540, 720) to dismiss it."
└─ Test applies the fix and retries
The LLM (DeepSeek V4 API, roughly a few dollars per month) reads the last screenshot and the error log, then suggests corrective actions. The script executes them and retries.
Real-world result: ~80% of "stuck" scenarios are recovered by Layer 3 without human intervention. The remaining ~20% generate a screenshot report for manual review.
Results After 3 Months
| Metric | Before | After |
|---|---|---|
| Devices covered | 1 (manual) | 2 (automated, scalable) |
| Platforms per release | 2 (Android + Web) | 3 (+ WeChat Mini Program) |
| Test execution time | 4h manual | 45min automated |
| Flaky test rate | N/A (manual) | ~12% (self-healing catches ~80%) |
| Infrastructure cost | $200/mo (BrowserStack trial) | ~$0 hardware + ~few $ API |
| Reports generated | Ad-hoc screenshots | 27+ structured HTML reports |
| New device onboarding | 2-3 days | ~2 hours (coordinate calibration + testing) |
The Tools
| Tool | Role | Cost |
|---|---|---|
| uiautomator2 | Android native element locator | Free, open source |
| ADB | Low-level device control | Free, Android SDK |
| Playwright | Web backend + limited mini-program | Free, open source |
| rapidocr | On-device OCR (no GPU needed) | Free, open source |
| pytest | Test runner | Free |
| Hermes Agent | LLM orchestration + self-healing | Free, open source |
| DeepSeek V4 API | LLM fallback (API call) | Pay-as-you-go (prepaid credits) |
Hardware cost: $0 (existing phones and computer). LLM API is pay-as-you-go, roughly a few dollars per month.
Lessons
Don't trust UI dump tools on WebView apps. uiautomator2, Appium, and their cousins can't see inside WebView content. Plan for coordinate or OCR-based fallbacks from day one.
IME input swallowing will waste a week of your life. Test
adb shell input textwith long strings (18+ chars) early, across all target devices. If characters drop, chain the commands.One KEYCODE_BACK press is never a bug; two is always a bug. Dismissing the keyboard after text input is mandatory but doing it twice exits the screen. Always count your back presses.
Vue 3 + Playwright = use
page.evaluate(). Don't debug whypage.click()silently fails. Just wrap it inevaluate()and move on.A 3-layer locator isn't overengineering. It's the difference between a test suite that breaks on every app update and one that survives for months with zero maintenance.
Low-budget infrastructure is achievable. With one Android phone, one computer, and a small API budget, you can build a self-healing test suite that absorbs device-specific weirdness.
This framework is maintained as an open-source project. If you're automating a health app, a WeChat ecosystem product, or anything with WebView + multi-device quirks — this playbook is built from the scars.
About open-sourcing deep-test: It's currently closed-source while we continue refining and stabilizing the architecture. Once it matures, we'll consider making it public. In the meantime, the tools mentioned here (uiautomator2 + ADB + rapidocr + Playwright) are all open source and free — the 7 Patterns in this playbook are enough to get you started.
About the author:
15 years in QA automation, creator of the deep-test framework. Building your own AI-powered test pipeline? You might find this useful:
👉 50 AI Testing Prompts for Web & Android — bilingual (EN/CN), $12, covering Web & Android testing scenarios.
Built with Hermes Agent on DeepSeek V4, one Oppo, one Huawei, and a QA engineer who refused to accept BrowserStack's $200/mo bill.
Top comments (0)