When I started working on AI agent tooling, I hit a wall: there's no clean way to automate GUI interactions on Wayland. X11 had xdotool and DISPLAY=:99 — Wayland killed all of that, by design. No global input injection, no screen grabbing without portal authorization dialogs.
So I built kwin-mcp, an MCP server that gives AI agents full GUI automation capabilities on Linux desktops, running in a completely isolated KWin Wayland session.
The Problem
Wayland's security model blocks the automation patterns we took for granted on X11. There's no equivalent of pointing tools at a virtual display. XDG RemoteDesktop portals require interactive user authorization — useless for headless automation. And most Wayland compositors don't expose any input injection API at all.
The Solution: Triple Isolation
kwin-mcp creates three layers of isolation for each session:
-
Private D-Bus bus (
dbus-run-session) — no interference with host services -
Virtual Wayland compositor (
kwin_wayland --virtual) — no windows on your display - Scoped input injection via KWin's EIS D-Bus interface — input stays inside the session
The AI agent never touches your host desktop. You get a fully functional KDE Plasma desktop that exists only in memory.
What It Exposes
The server provides 29 MCP tools:
| Category | Tools |
|---|---|
| Mouse | click, drag, scroll, move, button down/up |
| Keyboard | type (ASCII), type unicode, key press, key down/up |
| Touch | tap, swipe, pinch, multi-finger swipe |
| Screen | screenshot, accessibility tree, find UI elements, wait for element |
| Session | start, stop, launch app, list/focus windows |
| System | clipboard get/set, D-Bus calls, Wayland protocol info |
Screenshot capture runs at ~30-70ms per frame via KWin's ScreenShot2 D-Bus interface. Any action tool accepts a screenshot_after_ms parameter for burst frame capture without extra round-trips.
Why KWin Specifically?
KWin is (as far as I know) the only Wayland compositor that exposes an EIS (Emulated Input Server) D-Bus interface. This provides the only clean path for input injection without triggering XDG RemoteDesktop authorization dialogs. Since we own the isolated session, we bypass the portal entirely.
Limitations
- KDE Plasma 6+ only (no GNOME, no Sway)
- US QWERTY keyboard layout for direct typing (Unicode input via
wtypeworks) - AT-SPI2 accessibility coverage varies by application
Getting Started
# Install
pip install kwin-mcp
# Or with uv
uv tool install kwin-mcp
Add to your Claude Code MCP config:
{
"mcpServers": {
"kwin-mcp": {
"command": "kwin-mcp"
}
}
}
The project is MIT licensed and available on GitHub and PyPI.
I'd love to hear if anyone has experience with Wayland automation or has ideas for extending this to other compositors. The EIS protocol is theoretically compositor-agnostic, but KWin is the only one implementing it right now.
Top comments (0)