AX: A Design Discipline for the AI Agent Era

#ai #mcp #agentexperience #ucd

If you've built an MCP server, exposed tools for an AI agent, or crafted tool descriptions so the AI would call the right endpoint, you've been practicing Agent Experience — AX, a term Mathias Biilmann coined in early 2025. The current AX conversation tells you it matters. But knowing it matters and knowing how to evaluate it are different things. UX became a discipline when practitioners connected it to User-Centered Design and gave it testable principles and evaluation methods. AX needs the same grounding.

UX, DX, CX... AX

Every era of software recognizes that a particular consumer deserves intentional design. UX (end users), DX (developers), CX (customers) — each started informally, then someone named the discipline, principles emerged, and quality went up. AX — Agent Experience — follows the same arc. The new consumer is an AI agent: an LLM that reads your tool definitions, decides which to call, and orchestrates workflows. Biilmann and Netlify named it and showed why it matters. What's missing is a methodological foundation: a way to reason about why designs work and how to evaluate quality. This post grounds AX in the design traditions it inherits.

User-Centered Design

Every experience discipline rests on the same foundation: you're designing for a consumer you're not. That's User-Centered Design — refined over forty years. UX, DX, and CX are specializations of it. AX is the next one. It doesn't require a new methodology; it applies UCD to a new consumer: the agent.

In UX, you're not the user. You do research, test, iterate. In AX, you're not the agent. You can't assume the AI will "figure it out" because you know what you meant. The AI has no prior relationship with your system. It sees tool names, descriptions, and parameter schemas — and from that alone must decide what to call, when, and with what inputs. That's the entire interface. A developer has docs, tutorials, and time to learn. An AI agent has your tool signature and a few seconds of inference. The bar for clarity and self-description is dramatically higher.

Whether you use MCP, OpenAI function calling, or a custom framework, you're building an AX surface. Your tool definitions are the AI's entire interface to your system.

Four Design Pressures That Shift

AX is to API design what UX was to graphic design — not new principles, but applying them for a specific consumer under specific constraints. Four pressures shift when the consumer is an agent:

Context budget. With REST APIs, 500 endpoints cost the developer cognitive load; they find the three they need and ignore the rest. With AI tools, every definition consumes tokens from a finite context window. Too many or too verbose, and you degrade the AI's reasoning about all of them. Bloated surfaces are a performance problem, not just ergonomics.

Probabilistic selection. A developer writes fetch('/api/animals/4521') — explicit. An AI matches "check on that animal" to get_animal_info through semantic inference. Similar descriptions → wrong tool. Naming is a correctness concern.

Starts from zero every time. A developer learns your API over time. An AI has no memory. Every conversation is fresh. Rename a tool and there's no migration path. Stability and naming consistency matter far more.

Emergent composition. In APIs you design orchestration explicitly. With AI, the agent decides the sequence at runtime. You provide building blocks; the AI assembles them. Convenience endpoints that chain workflows remove the agent's ability to handle partial flows. The design pressure runs the opposite direction.

The AX Discipline Map

AX needs a structure for evaluation — like Nielsen's heuristics for UX. Six concerns we've found useful:

Orchestration Design — Can the agent assemble your tools into coherent workflows, or are there gaps?
Tool Contract Design — Can the agent figure out what a tool does, when to use it, and how to call it from the signature alone?
Context Economics — Are you spending the agent's context budget wisely, or drowning it in noise?
Domain Legibility — Does the agent understand your business after reading your tool definitions?
Safety & Trust Design — Can the agent investigate freely without accidentally breaking things?
Error & Recovery Design — When something goes wrong, can the agent figure out what to try next?

How do you know if AX is working? Test it. Run twenty representative user prompts. Measure tool selection accuracy, task completion, and recovery rate. That's the AX equivalent of a usability study.

Principles You Already Know

You don't need new principles — you need to apply existing ones with a different consumer in mind.

Single Responsibility. A tool should do one thing. A kitchen-sink tool with action: move | treat | ship | lookup forces the AI to reason combinatorially. Focused tools like get_animal_info, save_cattle_treatment, save_cattle_moves map cleanly to user intent.

Ubiquitous Language. The AI's only path to your domain is through tool and parameter names. In a feedlot system, operators say "calling feed" — so save_call_per_head with callPerHead in lbs matches naturally. Name it create_feed_record and the AI bridges a vocabulary gap that didn't need to exist. The AX surface is the documentation.

Composition over monoliths. Don't build performFullTreatmentWorkflow. Expose atomic steps: validate_treatment_tag, get_treatment_setup, save_cattle_treatment. The AI sequences them. Bury it in one tool and you lose partial workflows ("just check if 4521 is eligible") and graceful failure handling.

Safety boundaries. The AI needs to know which tools are safe to explore versus which have side effects. get_animal_info and get_open_lots — read-safe. save_cattle_treatment and save_cattle_moves — the agent should confirm before calling. Make the read/write boundary visible in the tool contract.

What AX Gets You

Netlify reports over 1,000 sites deploying daily from ChatGPT through agent-optimized infrastructure — measurable business impact from AX investment. Our experience building an POC MCP server for a feedlot system: applying AX principles (consistent get_/save_ naming, domain vocabulary, single-responsibility tools) eliminated compensatory system prompts. The AI selected the correct tool on first attempt for the vast majority of prompts — no few-shot examples, no elaborate tuning. One team's experience, not a controlled study, but the results were immediate.

Teams that treat AX as a discipline gain: better AI integrations with less prompt engineering, agent portability across Claude/GPT/Gemini, faster capability onboarding, lower token costs, and compounding returns as quality builds trust.

From Ad Hoc to Discipline

Most teams building tool servers today do it ad hoc — smart decisions without a shared framework. Start here: look at your tool definitions through the six concerns. Ask the litmus questions. The principles are the same ones you've practiced. They just have a new consumer.

AX will evolve as constraints shift — persistent memory, larger context windows. But the core challenge — making your systems legible for a reasoning consumer — persists whether the AI calls a tool, generates HTTP, or writes code. The mechanism is an implementation detail. The discipline is about the consumer.

In UX, you're not the user. In AX, you're not the agent. The craft of designing for someone else — that's the same skill it's always been.