How to Test Any App with AI in 30 Seconds
What if testing your app required zero test code?
Not "low-code testing." Not "AI-assisted test generation." Literally zero lines of test code — you describe what should happen, and AI does it.
That's what we built with flutter-skill: an open-source MCP server that gives AI agents eyes and hands inside any running app.
The Problem Nobody Talks About
E2E testing is universally hated for a reason:
// This breaks every time someone moves a button
final loginButton = find.byKey(Key('loginBtn'));
await tester.tap(loginButton);
await tester.pumpAndSettle();
final emailField = find.byKey(Key('emailField'));
await tester.enterText(emailField, 'test@example.com');
// ... 50 more lines of brittle selectors
You're not testing your app. You're maintaining a second codebase that mirrors your UI. Every refactor breaks it. Every design change means rewriting tests.
And it gets worse: every platform has its own testing framework. Flutter has integration_test. iOS has XCUITest. Android has Espresso. React Native has Detox. Web has Playwright. Each with its own API, its own quirks, its own debug cycle.
What if there was one tool for all of them?
The Idea: Let AI Be the User
Instead of writing robot instructions, what if we just... talked to the robot?
"Tap the login button, enter test@email.com as the email,
enter password123 as the password, tap submit,
and verify the dashboard loads."
That's a complete E2E test. No selectors. No framework-specific code. No maintenance when the UI changes — because AI understands what a "login button" looks like, regardless of its internal key.
This is what MCP (Model Context Protocol) makes possible. MCP lets AI tools like Claude, Cursor, and Windsurf connect to external services. flutter-skill is one of those services — it bridges AI to your running app's UI.
How It Works
The architecture is simple:
┌─────────────┐ MCP ┌────────────────┐ WebSocket ┌─────────────┐
│ AI Client │ ◄──────────► │ flutter-skill │ ◄────────────► │ Your App │
│ (Claude, │ JSON-RPC │ (MCP Server) │ JSON-RPC │ (any │
│ Cursor, │ │ │ on :18118 │ platform) │
│ Windsurf) │ └────────────────┘ └─────────────┘
└─────────────┘
- Your app includes a lightweight SDK (a few lines of code) that connects via WebSocket
- flutter-skill runs as an MCP server, translating AI commands into app interactions
- Your AI tool sends natural language instructions, which flutter-skill converts to precise UI operations
The SDK exposes your app's accessibility tree — every button, text field, label, and container — so the AI can see exactly what's on screen.
Setup: Actually 30 Seconds
Step 1: Install (5 seconds)
npm i -g flutter-skill
Step 2: Initialize your project (10 seconds)
cd your-app
flutter-skill init
This auto-detects your project type and patches your entry point:
-
pubspec.yaml→ Flutter -
Package.swift→ iOS native -
build.gradle.kts+AndroidManifest.xml→ Android native -
package.json+react-native→ React Native -
index.html→ Web -
package.json+electron→ Electron -
Cargo.toml+tauri→ Tauri -
build.gradle.kts+kotlin→ KMP -
.csproj+Maui→ .NET MAUI
Step 3: Add to your AI tool (15 seconds)
Add to your MCP config (e.g., Claude Desktop claude_desktop_config.json):
{
"mcpServers": {
"flutter-skill": {
"command": "flutter-skill"
}
}
}
That's it. Your AI can now see and interact with your app.
What AI Can Do
Once connected, your AI has access to 40+ tools:
Inspection
-
inspect— See the full UI element tree (accessibility labels, types, positions) -
get_element_details— Deep-dive into any specific element
Interaction
-
tap— Tap any element by description or index -
enter_text— Type into text fields -
scroll— Scroll in any direction -
swipe— Swipe gestures (e.g., dismiss, navigate) -
long_press— Long press for context menus
Verification
-
screenshot— Capture what the app looks like right now -
assert_exists— Verify an element is on screen -
get_text— Read text content from any element
Navigation
-
go_back— Navigate back -
open_url— Deep link to any route
Advanced
-
eval— Execute platform-native code (Dart, JS, Swift, Kotlin) -
get_logs— Read app console output
Real Example: Testing a Login Flow
Here's what happens when you tell Claude: "Test the login flow with invalid credentials and verify the error message."
Claude (via flutter-skill):
1. inspect() → sees the login screen with email field, password field, submit button
2. tap(element: "Email field")
3. enter_text(text: "bad@email.com")
4. tap(element: "Password field")
5. enter_text(text: "wrongpassword")
6. tap(element: "Submit")
7. screenshot() → captures the error state
8. assert_exists(element: "Invalid credentials") → ✅ verified
No test file created. No selectors maintained. If the UI changes tomorrow, the AI adapts — it looks for "Submit" by understanding the UI, not by memorizing a widget key.
The Numbers
We tested flutter-skill across 8 platforms with a comprehensive E2E test suite:
| Platform | Tests | Passing | Rate |
|---|---|---|---|
| Flutter iOS | 21 | 21 | 100% |
| Flutter Web | 20 | 20 | 100% |
| Electron | 24 | 24 | 100% |
| Android Native | 24 | 24 | 100% |
| KMP Desktop | 22 | 22 | 100% |
| React Native | 24 | 24 | 100% |
| Tauri | 24 | 23 | 95.8% |
| .NET MAUI | 24 | 23 | 95.8% |
| Total | 183 | 181 | 99.0% |
Every test is AI-driven. Zero hand-written test code.
Lessons from Building for 8 Platforms
Building SDKs for 8 platforms taught us things no tutorial covers:
Android: PNG Screenshots Kill WebSocket
Full-resolution PNG screenshots on Android are huge. Sending them over WebSocket caused timeouts. The fix: JPEG at 80% quality, downscaled to 720p. AI reads the UI just fine at lower resolution, and it saves ~90% bandwidth.
Tauri: eval() Is Fire-and-Forget
Tauri v2's eval() function doesn't return values. It executes JavaScript in the webview and... that's it. No callback, no promise, no return.
Our solution: open a secondary WebSocket on port 18120. The JavaScript sends its result there, and Rust receives it via a oneshot channel. Three ports total: HTTP health (18118), WS commands (18119), WS results (18120).
We also had to add ws://127.0.0.1:* to Tauri's CSP, otherwise WebSocket connections from the tauri:// origin to localhost are silently blocked.
React Native: Skip the Full Build
Building a full React Native project requires native modules, CocoaPods, Gradle — the works. For testing, we used a Node.js mock that implements the bridge protocol directly. Much faster, same result.
The go_back Race Condition
On Android, clearing currentActivity on onActivityPaused caused a race condition with go_back. The previous activity pauses before the new one resumes, leaving a brief window where the SDK thinks there's no activity. Fix: only clear on onActivityDestroyed.
When to Use This (and When Not To)
Use flutter-skill when:
- You want to test user-facing flows without writing test code
- You're building across multiple platforms and want one testing approach
- You're doing vibe coding and need AI to verify what it builds
- You want to prototype and test simultaneously
Don't use it for:
- Unit testing (that's a different problem)
- Performance benchmarking (AI interaction adds latency)
- Tests that need to run in < 1 second (AI thinking time)
Getting Started
# Install
npm i -g flutter-skill
# Auto-detect and configure your project
cd your-app
flutter-skill init
# Or try the built-in demo
flutter-skill demo
Add to your MCP config, and start talking to your app through AI.
GitHub: github.com/ai-dashboad/flutter-skill
MIT licensed. Contributions welcome.
flutter-skill supports Flutter, iOS, Android, Web, Electron, Tauri, KMP, React Native, and .NET MAUI. Works with Claude, Cursor, Windsurf, and any MCP-compatible AI tool.
Top comments (0)