charlieww

Posted on Feb 13

How to Test Any App with AI in 30 Seconds — Flutter, React Native, iOS, Android & More

#opensource

How to Test Any App with AI in 30 Seconds

What if testing your app required zero test code?

Not "low-code testing." Not "AI-assisted test generation." Literally zero lines of test code — you describe what should happen, and AI does it.

That's what we built with flutter-skill: an open-source MCP server that gives AI agents eyes and hands inside any running app.

The Problem Nobody Talks About

E2E testing is universally hated for a reason:

// This breaks every time someone moves a button
final loginButton = find.byKey(Key('loginBtn'));
await tester.tap(loginButton);
await tester.pumpAndSettle();
final emailField = find.byKey(Key('emailField'));
await tester.enterText(emailField, 'test@example.com');
// ... 50 more lines of brittle selectors

You're not testing your app. You're maintaining a second codebase that mirrors your UI. Every refactor breaks it. Every design change means rewriting tests.

And it gets worse: every platform has its own testing framework. Flutter has integration_test. iOS has XCUITest. Android has Espresso. React Native has Detox. Web has Playwright. Each with its own API, its own quirks, its own debug cycle.

What if there was one tool for all of them?

The Idea: Let AI Be the User

Instead of writing robot instructions, what if we just... talked to the robot?

"Tap the login button, enter test@email.com as the email,
enter password123 as the password, tap submit,
and verify the dashboard loads."

That's a complete E2E test. No selectors. No framework-specific code. No maintenance when the UI changes — because AI understands what a "login button" looks like, regardless of its internal key.

This is what MCP (Model Context Protocol) makes possible. MCP lets AI tools like Claude, Cursor, and Windsurf connect to external services. flutter-skill is one of those services — it bridges AI to your running app's UI.

How It Works

The architecture is simple:

┌─────────────┐     MCP      ┌────────────────┐   WebSocket   ┌─────────────┐
│  AI Client   │ ◄──────────► │  flutter-skill  │ ◄────────────► │  Your App    │
│ (Claude,     │   JSON-RPC   │  (MCP Server)   │   JSON-RPC    │  (any        │
│  Cursor,     │              │                 │   on :18118   │   platform)  │
│  Windsurf)   │              └────────────────┘               └─────────────┘
└─────────────┘

Your app includes a lightweight SDK (a few lines of code) that connects via WebSocket
flutter-skill runs as an MCP server, translating AI commands into app interactions
Your AI tool sends natural language instructions, which flutter-skill converts to precise UI operations

The SDK exposes your app's accessibility tree — every button, text field, label, and container — so the AI can see exactly what's on screen.

Setup: Actually 30 Seconds

Step 1: Install (5 seconds)

npm i -g flutter-skill

Step 2: Initialize your project (10 seconds)

cd your-app
flutter-skill init

This auto-detects your project type and patches your entry point:

pubspec.yaml → Flutter
Package.swift → iOS native
build.gradle.kts + AndroidManifest.xml → Android native
package.json + react-native → React Native
index.html → Web
package.json + electron → Electron
Cargo.toml + tauri → Tauri
build.gradle.kts + kotlin → KMP
.csproj + Maui → .NET MAUI

Step 3: Add to your AI tool (15 seconds)

Add to your MCP config (e.g., Claude Desktop claude_desktop_config.json):

{
  "mcpServers": {
    "flutter-skill": {
      "command": "flutter-skill"
    }
  }
}

That's it. Your AI can now see and interact with your app.

What AI Can Do

Once connected, your AI has access to 40+ tools:

Inspection

inspect — See the full UI element tree (accessibility labels, types, positions)
get_element_details — Deep-dive into any specific element

Interaction

tap — Tap any element by description or index
enter_text — Type into text fields
scroll — Scroll in any direction
swipe — Swipe gestures (e.g., dismiss, navigate)
long_press — Long press for context menus

Verification

screenshot — Capture what the app looks like right now
assert_exists — Verify an element is on screen
get_text — Read text content from any element

Navigation

go_back — Navigate back
open_url — Deep link to any route

Advanced

eval — Execute platform-native code (Dart, JS, Swift, Kotlin)
get_logs — Read app console output

Real Example: Testing a Login Flow

Here's what happens when you tell Claude: "Test the login flow with invalid credentials and verify the error message."

Claude (via flutter-skill):

1. inspect() → sees the login screen with email field, password field, submit button
2. tap(element: "Email field")
3. enter_text(text: "bad@email.com")
4. tap(element: "Password field")
5. enter_text(text: "wrongpassword")
6. tap(element: "Submit")
7. screenshot() → captures the error state
8. assert_exists(element: "Invalid credentials") → ✅ verified

No test file created. No selectors maintained. If the UI changes tomorrow, the AI adapts — it looks for "Submit" by understanding the UI, not by memorizing a widget key.

The Numbers

We tested flutter-skill across 8 platforms with a comprehensive E2E test suite:

Platform	Tests	Passing	Rate
Flutter iOS	21	21	100%
Flutter Web	20	20	100%
Electron	24	24	100%
Android Native	24	24	100%
KMP Desktop	22	22	100%
React Native	24	24	100%
Tauri	24	23	95.8%
.NET MAUI	24	23	95.8%
Total	183	181	99.0%

Every test is AI-driven. Zero hand-written test code.

Lessons from Building for 8 Platforms

Building SDKs for 8 platforms taught us things no tutorial covers:

Android: PNG Screenshots Kill WebSocket

Full-resolution PNG screenshots on Android are huge. Sending them over WebSocket caused timeouts. The fix: JPEG at 80% quality, downscaled to 720p. AI reads the UI just fine at lower resolution, and it saves ~90% bandwidth.

Tauri: eval() Is Fire-and-Forget

Tauri v2's eval() function doesn't return values. It executes JavaScript in the webview and... that's it. No callback, no promise, no return.

Our solution: open a secondary WebSocket on port 18120. The JavaScript sends its result there, and Rust receives it via a oneshot channel. Three ports total: HTTP health (18118), WS commands (18119), WS results (18120).

We also had to add ws://127.0.0.1:* to Tauri's CSP, otherwise WebSocket connections from the tauri:// origin to localhost are silently blocked.

React Native: Skip the Full Build

Building a full React Native project requires native modules, CocoaPods, Gradle — the works. For testing, we used a Node.js mock that implements the bridge protocol directly. Much faster, same result.

The go_back Race Condition

On Android, clearing currentActivity on onActivityPaused caused a race condition with go_back. The previous activity pauses before the new one resumes, leaving a brief window where the SDK thinks there's no activity. Fix: only clear on onActivityDestroyed.

When to Use This (and When Not To)

Use flutter-skill when:

You want to test user-facing flows without writing test code
You're building across multiple platforms and want one testing approach
You're doing vibe coding and need AI to verify what it builds
You want to prototype and test simultaneously

Don't use it for:

Unit testing (that's a different problem)
Performance benchmarking (AI interaction adds latency)
Tests that need to run in < 1 second (AI thinking time)

Getting Started

# Install
npm i -g flutter-skill

# Auto-detect and configure your project
cd your-app
flutter-skill init

# Or try the built-in demo
flutter-skill demo

Add to your MCP config, and start talking to your app through AI.

GitHub: github.com/ai-dashboad/flutter-skill

MIT licensed. Contributions welcome.

flutter-skill supports Flutter, iOS, Android, Web, Electron, Tauri, KMP, React Native, and .NET MAUI. Works with Claude, Cursor, Windsurf, and any MCP-compatible AI tool.

DEV Community