Last week I shipped a cross-platform app and needed to test it on Flutter, React Native, iOS, Android, Electron, Tauri, and web. Writing separate test suites for each platform? No thanks.
Instead, I used an AI agent that could see my app and interact with it. Here is what the workflow looked like:
The Setup
I used flutter-skill, an open-source MCP server that gives AI agents eyes and hands inside running apps. It connects to your app via a lightweight bridge and exposes 253 tools the AI can use.
npm install -g flutter-skill
flutter-skill init ./my-app
flutter-skill launch ./my-app
Testing with Natural Language
Instead of writing test code, I just described what to test:
Test the login flow - enter test@example.com and password123, tap Login, verify the Dashboard appears
The AI agent automatically:
- Takes a screenshot to see the current state
- Discovers all interactive elements with semantic refs
- Taps, types, scrolls - just like a human
- Verifies the expected outcome
- Screenshots each step for evidence
The Results
Across 8 platforms, the AI agent completed 562 out of 567 test scenarios (99.1% pass rate). The failures were all legitimate bugs it discovered.
What surprised me most:
- Zero test code written - everything was natural language
- Cross-platform for free - same test descriptions worked on iOS, Android, web, desktop
- Found real bugs - the AI explored edge cases I would not have thought of
- Snapshot is 99% more token-efficient than screenshots - the accessibility tree gives the AI structured data instead of pixels
When to Use This vs Traditional Automation
Use AI testing when:
- You need to test across multiple platforms quickly
- You want to explore edge cases without writing explicit tests
- Your team does not have dedicated SDET resources
- You need fast smoke tests during development
Stick with traditional automation when:
- You need deterministic, repeatable CI/CD tests
- Performance benchmarking
- Testing specific race conditions
Try It
flutter-skill is open source and free: github.com/ai-dashboad/flutter-skill
Works with Claude, GPT, Gemini, Cursor, Windsurf, and any MCP-compatible agent.
Would love to hear if anyone else is using AI agents for testing - what is working for you?
Top comments (0)