xulingfeng

Posted on May 22 • Edited on May 23

Low-Budget Multi-Device QA: Automating 3 Platforms with Open Source Tools

#ai #opensource #hermes #agents

Low-Budget Multi-Device QA: Automating 3 Platforms with Open Source Tools

Practical automation patterns for health apps across Android APK, WeChat Mini Program, and Web backend — using only open source tools and the hardware you already have.

The Problem

You have a medical app that ships on three surfaces:

Android APK — the doctor's side, a uni-app WebView wrapper
WeChat Mini Program — the patient's side, running inside WeChat's sandbox
Web Backend — admin panel, Vue3 + Element Plus

You have two test phones: an Oppo PCKM00 and a Huawei ANA-AN00. Your budget for test infrastructure: zero. No BrowserStack, no Sauce Labs, no paid SaaS.

Oh, and the APK is a WebView wrapper — the app's core UI lives inside a WebView that's invisible to Android's UI dump (uiautomator2 can't see it). And WeChat's mini-program runtime intercepts standard automation primitives. And the two phones have different screen resolutions and keyboard heights. And you don't have sudo on the CI machine.

This is the problem deep-test was built to solve. Here's the playbook.

Architecture Overview

┌─────────────────────────────────────────────┐
│              deep-test (Hermes Agent)        │
├─────────────────────────────────────────────┤
│  core/                                       │
│  ├── device.py   → device registry + ADB     │
│  ├── coords.py   → multi-device scaling      │
│  ├── locator.py  → 3-layer self-healing      │
│  ├── ocr.py      → rapidocr wrapper          │
│  ├── runner.py   → retry + LLM fallback     │
│  └── web-runner.cjs → Playwright + Vue3 fix  │
├─────────────────────────────────────────────┤
│  projects/med-app/                          │
│  ├── android/   → login, patient, chat       │
│  ├── miniprogram/ → mini-program flows       │
│  ├── web/       → admin panel (Playwright)   │
│  └── scenarios/ → cross-platform orchestration│
├─────────────────────────────────────────────┤
│  reports/ (HTML + screenshots)               │
└─────────────────────────────────────────────┘

Hardware cost: $0. Every tool is open source. The phones are existing hardware. The LLM fallback uses DeepSeek V4 API (pay-as-you-go, roughly a few dollars per month).

Pattern 1: The 3-Layer Self-Healing Locator

HTML dumps can't see WebView content. Pure coordinates break across devices. The solution: a cascade of three fallback strategies.

def locate(element_id, serial, device_alias):
    """Try each strategy in order. Fail fast, retry smart."""

    # Layer 1: uiautomator2 XML (fastest, works for native elements)
    try:
        return u2_session(serial).resourceId(element_id).bounds
    except:
        pass  # Element is in WebView — not in XML

    # Layer 2: Coordinate map (device-aware, cached)
    try:
        return Coords[device_alias][element_id]
    except KeyError:
        pass  # Unknown element — need OCR

    # Layer 3: OCR + LLM fallback (slowest but most resilient)
    screenshot = take_screenshot(serial)
    ocr_result = ocr(screenshot)

    # LLM reads the screenshot, returns the action + coordinates
    response = llm.ask(
        f"Screen shows: {ocr_result}. Find '{element_id}' and return its center coordinates."
    )
    return parse_coords(response)

What this solves:

Coord-only tests work on Oppo but break on Huawei (different screen dimensions)
uiautomator2 can't reach WebView content inside the uni-app shell
OCR is slow but catches everything — acts as the safety net

Real-world numbers: Layer 1 handles ~30% of locators (native login buttons). Layer 2 handles ~50% (known UI elements in the mini-program). Layer 3 catches the remaining ~20% (dynamic content, confirmation dialogs). Average locate time with Layer 1: 200ms. Layer 3: 2-4 seconds.

Pattern 2: The Keyboard Nightmare

This single bug ate more debug time than any other issue.

The Huawei ANA-AN00's stock IME doesn't play nicely with adb shell input text. The keyboard overlays the password field, and after typing, the "Login" button is hidden behind the keyboard.

The two devices have different keyboard heights — the Huawei IME panel is ~310px, roughly 100px taller than the Oppo's ~210px.

The fix sequence:

def type_and_submit(serial, text):
    # Step 1: Type text with chained commands (anti-IME swallowing)
    cmd = " && ".join(
        f"shell input text {ch} && sleep 0.08" 
        for ch in text
    )
    subprocess.run(["adb", "-s", serial, cmd], timeout=60, shell=True)

    # Step 2: Dismiss keyboard (CRITICAL)
    subprocess.run([
        "adb", "-s", serial,
        "shell", "input keyevent KEYCODE_BACK"
    ], timeout=5)
    time.sleep(2)

    # Step 3: Now the button is visible — click it
    coords = Coords.scale_y(device_alias, "login_button")
    subprocess.run([
        "adb", "-s", serial,
        "shell", f"input tap {coords.x} {coords.y}"
    ])

Key insight: KEYCODE_BACK dismisses the keyboard without leaving the form. A second press would exit the activity — one press is the sweet spot.

Why not use uiautomator2(text="登录").click()? Because when the keyboard is up, it intercepts the click target. The tap lands on the keyboard overlay, not the button.

Pattern 3: Defeating the IME Input Hog

Both Baidu IME (Oppo) and Sogou IME (Huawei) have a nasty behavior: they swallow individual adb shell input text commands that arrive too fast.

Wrong approach (will lose characters):

for ch in id_number:
    adb_cmd(serial, f"shell input text {ch}")

The stock IME on Oppo drops ~1 in every 3 characters this way. The 18th digit of an ID number is almost always missing.

Right approach (chained with sleep):

cmd = " && ".join(
    f"shell input text {ch} && sleep 0.08"
    for ch in id_number
)
adb_cmd(serial, cmd)

Each character gets 80ms of settling time. The entire 18-digit ID takes ~1.5s. Tested across 50+ runs: zero lost characters.

Pattern 4: Cross-Device Coordinate Scaling

The Oppo is 1080×2400. The Huawei is 1080×2340. Every Y coordinate needs to be scaled.

class Coords:
    BASE_DEVICE = "oppo"  # All coordinates recorded here
    REFERENCE_HEIGHT = 2400

    @staticmethod
    def scale_y(device_alias, element_key):
        """Scale Y coordinate from reference device to target device."""
        base_y = COORD_MAP[element_key][1]
        target_height = DEVICE_REGISTRY[device_alias]["height"]
        scale_factor = target_height / Coords.REFERENCE_HEIGHT
        return int(base_y * scale_factor)

With this, every interactable element has exactly one coordinate entry (recorded on Oppo), and all other devices auto-scale. Adding a Huawei Mate 60 or a Xiaomi 14 is a one-line config change.

Pattern 5: Playwright × Vue3 — The Synthetic Event Trap

Vue 3 doesn't respond to Playwright's synthetic click events. The framework dispatches a PointerEvent but Vue's internal vnode listener doesn't pick it up.

Doesn't work:

await page.click('.el-button--primary');

Works:

await page.evaluate(() => {
    document.querySelector('.el-button--primary').click();
});

Why? Playwright's synthetic events use CDP (Chrome DevTools Protocol) input dispatch, which bypasses Vue's event delegation layer in certain configurations. element.click() fires the native click handler directly, which Vue's runtime picks up correctly.

Rule of thumb: If Playwright clicks land silently (no error, no action), wrap them in page.evaluate().

Pattern 6: The OCR-Based Dynamic Button Locator

When a UI element moves based on previous actions (e.g., "Add Patient" button scrolls down as more patients are added), coordinates become unreliable. OCR is the solution.

def find_button_y(serial, button_text, max_scrolls=3):
    """Scroll down until the button text appears, return its Y."""
    for attempt in range(max_scrolls):
        texts = take_ocr(serial, f"find_{button_text}")

        for text_bbox in texts:
            if button_text in text_bbox.text:
                return text_bbox.center_y

        # Not found — scroll down
        subprocess.run([
            "adb", "-s", serial,
            "shell", "input swipe 540 1500 540 500 500"
        ], timeout=10)
        time.sleep(1.5)

    raise LocateError(f"'{button_text}' not found after {max_scrolls} scrolls")

This replaced a brittle coordinate system where the "Save" button Y shifted by ~48px per patient added. After 9 patients, it scrolled off-screen entirely.

Pattern 7: The LLM Self-Healing Loop

When a test fails despite all the above layers, the system doesn't crash — it invokes the LLM.

Test Fails (e.g., Element 'start_consultation' not found)
    │
    ├─ Layer 1 Retry (×2): Re-query uiautomator2 with longer wait
    │     └─ Still failing? →
    ├─ Layer 2 Retry (×2): Refresh OCR with different threshold
    │     └─ Still failing? →
    └─ Layer 3: LLM Diagnosis
          ├─ Screenshot + error → LLM analyzes the screen
          ├─ LLM suggests: "A confirmation dialog 'Are you sure?' is blocking
          │   the button. Click coordinate (540, 720) to dismiss it."
          └─ Test applies the fix and retries

The LLM (DeepSeek V4 API, roughly a few dollars per month) reads the last screenshot and the error log, then suggests corrective actions. The script executes them and retries.

Real-world result: ~80% of "stuck" scenarios are recovered by Layer 3 without human intervention. The remaining ~20% generate a screenshot report for manual review.

Results After 3 Months

Metric	Before	After
Devices covered	1 (manual)	2 (automated, scalable)
Platforms per release	2 (Android + Web)	3 (+ WeChat Mini Program)
Test execution time	4h manual	45min automated
Flaky test rate	N/A (manual)	~12% (self-healing catches ~80%)
Infrastructure cost	$200/mo (BrowserStack trial)	~$0 hardware + ~few $ API
Reports generated	Ad-hoc screenshots	27+ structured HTML reports
New device onboarding	2-3 days	~2 hours (coordinate calibration + testing)

The Tools

Tool	Role	Cost
uiautomator2	Android native element locator	Free, open source
ADB	Low-level device control	Free, Android SDK
Playwright	Web backend + limited mini-program	Free, open source
rapidocr	On-device OCR (no GPU needed)	Free, open source
pytest	Test runner	Free
Hermes Agent	LLM orchestration + self-healing	Free, open source
DeepSeek V4 API	LLM fallback (API call)	Pay-as-you-go (prepaid credits)

Hardware cost: $0 (existing phones and computer). LLM API is pay-as-you-go, roughly a few dollars per month.

Lessons

Don't trust UI dump tools on WebView apps. uiautomator2, Appium, and their cousins can't see inside WebView content. Plan for coordinate or OCR-based fallbacks from day one.
IME input swallowing will waste a week of your life. Test adb shell input text with long strings (18+ chars) early, across all target devices. If characters drop, chain the commands.
One KEYCODE_BACK press is never a bug; two is always a bug. Dismissing the keyboard after text input is mandatory but doing it twice exits the screen. Always count your back presses.
Vue 3 + Playwright = use page.evaluate(). Don't debug why page.click() silently fails. Just wrap it in evaluate() and move on.
A 3-layer locator isn't overengineering. It's the difference between a test suite that breaks on every app update and one that survives for months with zero maintenance.
Low-budget infrastructure is achievable. With one Android phone, one computer, and a small API budget, you can build a self-healing test suite that absorbs device-specific weirdness.

This framework is maintained as an open-source project. If you're automating a health app, a WeChat ecosystem product, or anything with WebView + multi-device quirks — this playbook is built from the scars.

About open-sourcing deep-test: It's currently closed-source while we continue refining and stabilizing the architecture. Once it matures, we'll consider making it public. In the meantime, the tools mentioned here (uiautomator2 + ADB + rapidocr + Playwright) are all open source and free — the 7 Patterns in this playbook are enough to get you started.

About the author:
15 years in QA automation, creator of the deep-test framework. Building your own AI-powered test pipeline? You might find this useful:
👉 50 AI Testing Prompts for Web & Android — bilingual (EN/CN), $12, covering Web & Android testing scenarios.

Built with Hermes Agent on DeepSeek V4, one Oppo, one Huawei, and a QA engineer who refused to accept BrowserStack's $200/mo bill.

DEV Community

Low-Budget Multi-Device QA: Automating 3 Platforms with Open Source Tools

Low-Budget Multi-Device QA: Automating 3 Platforms with Open Source Tools

The Problem

Architecture Overview

Pattern 1: The 3-Layer Self-Healing Locator

Pattern 2: The Keyboard Nightmare

Pattern 3: Defeating the IME Input Hog

Pattern 4: Cross-Device Coordinate Scaling

Pattern 5: Playwright × Vue3 — The Synthetic Event Trap

Pattern 6: The OCR-Based Dynamic Button Locator

Pattern 7: The LLM Self-Healing Loop

Results After 3 Months

The Tools

Lessons

Top comments (0)