DEV Community

Cover image for I built a CLI to test Tauri apps because nothing else worked
Mathieu
Mathieu

Posted on

I built a CLI to test Tauri apps because nothing else worked

I spent a weekend trying to set up end-to-end testing for a Tauri v2 app. Two hours into configuring WebdriverIO, I still couldn't get it to connect to the WebView. The official docs show a minimal example that doesn't cover IPC testing. Playwright flat-out doesn't work because Tauri uses WebKitGTK, not Chromium.

I gave up and wrote my own tool.

The actual problem

If you search "Tauri e2e testing" on GitHub, you'll find the same question asked over and over. The official docs split testing into two worlds: Rust unit tests for the backend, and WebDriver-based tests for the frontend. But nobody tells you how to verify that your Tauri IPC commands actually work from the user's perspective. You end up mocking window.__TAURI__ in your frontend tests and hoping production behaves the same way.

It doesn't always behave the same way.

What I wanted was simple: connect to a running Tauri app, inspect the UI, click buttons, fill forms, and check that things happened. Like Playwright, but for Tauri. No WebDriver binary. No Selenium. No 200-line config file.

What tauri-pilot does

tauri-pilot is a Rust CLI that talks to your Tauri app over a Unix socket. You add a plugin to your app (2 lines, debug builds only), install the CLI, and start testing.

$ tauri-pilot snapshot -i
- heading "PR Dashboard" [ref=e1]
- textbox "Search PRs" [ref=e2] value=""
- button "Refresh" [ref=e3]
- list "PR List" [ref=e4]
  - listitem "fix: resolve memory leak #142" [ref=e5]
  - listitem "feat: add workspace support #138" [ref=e6]
- button "Load More" [ref=e7]
Enter fullscreen mode Exit fullscreen mode

That snapshot command walks the accessibility tree and gives every interactive element a short ref like @e3. You use those refs to interact:

$ tauri-pilot click @e3
ok

$ tauri-pilot fill @e2 "workspace"
ok

$ tauri-pilot assert text @e1 "PR Dashboard"
ok

$ tauri-pilot assert visible @e7
ok
Enter fullscreen mode Exit fullscreen mode

Exit code 0 means pass. Exit code 1 means fail with a message. That's it. No test framework to learn.

Setup takes 2 minutes

Add the plugin to your Tauri app:

// src-tauri/src/main.rs
fn main() {
    let mut builder = tauri::Builder::default();

    #[cfg(debug_assertions)]
    {
        builder = builder.plugin(tauri_plugin_pilot::init());
    }

    builder.run(tauri::generate_context!()).expect("error running app");
}
Enter fullscreen mode Exit fullscreen mode

Install the CLI:

cargo install tauri-pilot
Enter fullscreen mode Exit fullscreen mode

Run your app in dev mode. That's it. tauri-pilot ping should respond.

The plugin only compiles in debug builds. It won't end up in your release binary.

Why not just use WebdriverIO?

I tried. Here's what the experience looks like:

With WebdriverIO, you need to install Node.js dependencies, configure wdio.conf.ts, point it at your built binary (not the dev server), make sure the WebDriver binary matches your WebKit version, write tests in JavaScript even though your app is in Rust, and deal with flaky selectors that break when your UI changes.

With tauri-pilot, you run cargo install tauri-pilot, add 2 lines to your app, and start writing shell commands. The accessibility tree refs (@e1, @e2) are stable across renders as long as the element exists. You can use CSS selectors too, or pixel coordinates for edge cases.

The diff command is something WebdriverIO can't do at all:

$ tauri-pilot diff -i
+ button "New PR" [ref=e8]
~ list "PR List" [ref=e4]: children 2 → 3
Enter fullscreen mode Exit fullscreen mode

It compares the current UI state to the previous snapshot and shows only what changed. When you're debugging a UI issue, that's worth more than any assertion framework.

It's built for AI agents

I initially built tauri-pilot so Claude Code could interact with my Tauri apps. The output format is designed for LLM consumption: compact accessibility tree, short refs, plain text responses.

An AI agent's workflow looks like:

  1. snapshot -i to understand the current page
  2. Pick a ref from the output and interact with it
  3. assert to verify the result
  4. diff -i to see what changed without re-reading the entire tree (saves tokens)

The --json flag on every command gives structured output for programmatic use.

But you don't need an AI agent to benefit from this. I use tauri-pilot directly in bash scripts and interactively while developing. It's faster than clicking through the UI manually when I'm testing a form flow for the fifth time.

What's in v0.2.0

The latest release (just shipped) adds:

  • Record/replaytauri-pilot record start, interact with your app, record stop --output test.json, then replay test.json to rerun the whole session. Export to a shell script with replay --export sh.
  • Multi-window supportwindows lists all windows, --window flag targets a specific one.
  • Form dumpforms grabs all form fields at once instead of calling value on each input.
  • localStorage/sessionStorage accessstorage get "token", storage set "key" "value", storage list, storage clear.
  • Drag & dropdrag @e5 @e6 for sortable lists, drop @e3 --file ./image.png for file upload zones.
  • DOM watchwatch blocks until a DOM mutation happens, then prints a summary. Good for waiting on async updates.

The full changelog is on GitHub.

How it works under the hood

The plugin starts a Unix socket server when your app launches. The CLI connects to that socket and sends JSON-RPC messages. The plugin injects a JS bridge (window.__PILOT__) into the WebView that handles DOM inspection.

tauri-pilot architecture diagram

The tricky part was getting return values from webview.eval(). In Tauri v2, eval() is fire-and-forget. There's no way to get a result back directly. So every JS evaluation wraps the script in a try/catch, calls back into Rust via IPC (invoke('plugin:pilot|__callback', {id, result})), and the Rust side waits on a oneshot channel with a 10-second timeout.

It works reliably. I've been running it daily for weeks.

Limitations

Linux only for now. Tauri uses WebKitGTK on Linux, WKWebView on macOS, and WebView2 on Windows. The socket and JS bridge approach should work on all platforms, but I haven't tested it. macOS and Windows support is planned.

The screenshot command uses html-to-image bundled into the JS bridge, which means it captures the WebView content but not native window decorations or system dialogs.

And it's pre-1.0. The API might change. I'm using semver, so minor bumps can include breaking changes until 1.0.

Try it

cargo install tauri-pilot
Enter fullscreen mode Exit fullscreen mode

GitHub: github.com/mpiton/tauri-pilot
Docs: mpiton.github.io/tauri-pilot
crates.io: crates.io/crates/tauri-plugin-pilot

If you're testing a Tauri v2 app on Linux, give it a shot. Issues and PRs are open.

Top comments (0)