DEV Community

Byeonghoon Yoo
Byeonghoon Yoo

Posted on

Building an MCP Server for Linux Desktop GUI Automation on Wayland

When I started working on AI agent tooling, I hit a wall: there's no clean way to automate GUI interactions on Wayland. X11 had xdotool and DISPLAY=:99 — Wayland killed all of that, by design. No global input injection, no screen grabbing without portal authorization dialogs.

So I built kwin-mcp, an MCP server that gives AI agents full GUI automation capabilities on Linux desktops, running in a completely isolated KWin Wayland session.

The Problem

Wayland's security model blocks the automation patterns we took for granted on X11. There's no equivalent of pointing tools at a virtual display. XDG RemoteDesktop portals require interactive user authorization — useless for headless automation. And most Wayland compositors don't expose any input injection API at all.

The Solution: Triple Isolation

kwin-mcp creates three layers of isolation for each session:

  1. Private D-Bus bus (dbus-run-session) — no interference with host services
  2. Virtual Wayland compositor (kwin_wayland --virtual) — no windows on your display
  3. Scoped input injection via KWin's EIS D-Bus interface — input stays inside the session

The AI agent never touches your host desktop. You get a fully functional KDE Plasma desktop that exists only in memory.

What It Exposes

The server provides 29 MCP tools:

Category Tools
Mouse click, drag, scroll, move, button down/up
Keyboard type (ASCII), type unicode, key press, key down/up
Touch tap, swipe, pinch, multi-finger swipe
Screen screenshot, accessibility tree, find UI elements, wait for element
Session start, stop, launch app, list/focus windows
System clipboard get/set, D-Bus calls, Wayland protocol info

Screenshot capture runs at ~30-70ms per frame via KWin's ScreenShot2 D-Bus interface. Any action tool accepts a screenshot_after_ms parameter for burst frame capture without extra round-trips.

Why KWin Specifically?

KWin is (as far as I know) the only Wayland compositor that exposes an EIS (Emulated Input Server) D-Bus interface. This provides the only clean path for input injection without triggering XDG RemoteDesktop authorization dialogs. Since we own the isolated session, we bypass the portal entirely.

Limitations

  • KDE Plasma 6+ only (no GNOME, no Sway)
  • US QWERTY keyboard layout for direct typing (Unicode input via wtype works)
  • AT-SPI2 accessibility coverage varies by application

Getting Started

# Install
pip install kwin-mcp

# Or with uv
uv tool install kwin-mcp
Enter fullscreen mode Exit fullscreen mode

Add to your Claude Code MCP config:

{
  "mcpServers": {
    "kwin-mcp": {
      "command": "kwin-mcp"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

The project is MIT licensed and available on GitHub and PyPI.


I'd love to hear if anyone has experience with Wayland automation or has ideas for extending this to other compositors. The EIS protocol is theoretically compositor-agnostic, but KWin is the only one implementing it right now.

Top comments (0)