DEV Community

Cover image for PagerDuty MCP Unlocks AI Agent Awareness: The Case for Controlled Escalation
Dipen Bhikadya
Dipen Bhikadya

Posted on • Originally published at awaithuman.dev

PagerDuty MCP Unlocks AI Agent Awareness: The Case for Controlled Escalation

PagerDuty MCP Unlocks AI Agent Awareness: The Case for Controlled Escalation

But here is the question most teams skip: once your agent can see everything and act on some of it, who decides which actions actually execute?

Table of Contents

What Is PagerDuty MCP and Why It Changes the Game

PagerDuty MCP is the Model Context Protocol implementation that enables AI agents to interact directly with PagerDuty's incident management platform. It gives agents access to incidents, services, on-call schedules, and escalation policies without requiring custom API glue code, letting them pull incident details, check schedules, and trigger responses via natural language.

The Model Context Protocol, developed by Anthropic and released as an open standard, standardizes how AI models connect to external tools. The official open-source implementation is maintained by PagerDuty and written in Python, implementing the full MCP specification.

PagerDuty's expansion into the AI ecosystem reflects the industry shift toward agentic operations. The company added more than 30 AI partners across 11 categories to its integration directory, as announced in the PagerDuty newsroom, building on a foundation of over 700 traditional integrations.

From Read-Only Awareness to Action

The implementation exposes tools that fall into two categories. Read operations let agents list incidents, get incident details, check service status, and look up on-call schedules. Write operations let agents acknowledge incidents, add notes, and trigger escalations. That second category is where things get interesting.

A read-only agent is a dashboard you can talk to. A read-and-write agent is an operator who can change your system state. The difference between them is the difference between "tell me what is broken" and "go fix it." Most teams start with read-only, then enable write operations as they gain confidence. That is the right instinct, but it is not enough by itself.

The Gap MCP Alone Doesn't Solve

It does not handle the connection between your agent and the humans who need to verify its decisions. When an agent decides to acknowledge an incident, escalate a severity, or trigger a new page, the implementation simply executes the API call. There is no built-in approval step, no reasoning trace review, no way to say "show me why the agent thinks this is the right action before it happens."

That gap is the subject of this article. MCP is powerful infrastructure. But for production agentic workflows, awareness alone is not enough. You need controlled escalation.

The PagerDuty MCP Ecosystem: Components and Integrations

It now spans multiple deployment options, client integrations, and authentication patterns.

The Official Implementation

The core of the ecosystem is the official open-source implementation on GitHub. It is written in Python and implements the full MCP specification.

The implementation exposes a standardized tool set: list incidents, get incident details, list services, list on-call schedules, list escalation policies, acknowledge incidents, and add notes.

Client Integrations: Claude, Cursor, VS Code

The Cursor Plugin is particularly interesting for developer productivity. It installs in one step from the official Cursor Marketplace, giving developers access to incident context without leaving their editor. A developer can ask "what is the current on-call rotation for our payment service?" while reviewing a pull request and get an answer immediately.

Authentication Models Compared

Two authentication patterns exist. Local deployments use API tokens stored in environment variables or configuration files.

The OAuth path is more secure for production because tokens are short-lived and scoped to specific permissions. The API token path is simpler for local development and testing.

Panther Security: A Practical SOC Example

Analysts can check incident status, verify on-call coverage, and escalate issues directly from a chat interface. The integration demonstrates how MCP transforms operational workflows by reducing context switching.

But even in Panther's implementation, the agent remains a query tool. It does not autonomously acknowledge or escalate incidents. The human stays in the loop for every write operation. That design choice points directly to the pattern this article advocates.

Three Deployment Models for PagerDuty MCP

PagerDuty-Hosted Implementation

This model minimizes setup effort and ensures the implementation stays updated with new API capabilities.

For teams that want a fast integration with minimal overhead, this is the straightforward choice.

Self-Hosted Implementation

The self-hosted option runs the official open-source implementation on your own infrastructure.

This model gives you full control over the execution environment, network policies, and server configuration. It is the right choice for compliance-constrained environments where data cannot leave your network, or teams that need to extend the implementation with custom tools.

Embedded Client Integration

Some MCP clients, like Cursor and VS Code, support configuring MCP implementations directly within the client's settings. You provide the server command and credentials, and the client manages the server process. This model requires no separate server deployment.

The embedded approach works best for individual developer productivity. It does not scale to production agentic workflows where multiple agents need coordinated access to the same implementation instance.

How to Choose and Deploy Your PagerDuty MCP Integration

Choosing the right deployment model starts with your use case. A single developer wanting incident context in their editor needs a different setup than a production agent running on a server.

Assess Your Use Case

Ask three questions. First, who or what will be consuming the service? A human developer in their IDE or a fully autonomous agent operating without supervision. Second, what actions will the consumer take? Read-only queries or write operations that change system state. Third, what are your security and compliance requirements? Can authentication tokens live in configuration files, or do you need OAuth with short-lived credentials?

Configure Authentication and Connect

For local testing, the API token path is fastest. For production, use the OAuth path.

Once the implementation is running, configure your client. If you are using Claude Desktop, add the server configuration to the MCP settings file. For VS Code, add it to the mcpServers section of your settings. For Cursor, install the official plugin from the marketplace.

Test and Iterate

Start with read queries. Ask your agent to list current incidents, check the on-call schedule, or pull details on a specific alert. Verify the responses are accurate and the agent understands the returned data structure. Only after you trust the read path should you enable write operations.

Add Human Oversight for Critical Actions

This is the step most deployment guides skip. Once your agent can acknowledge incidents, add notes, or trigger escalations, you need a layer that intercepts those actions before they reach PagerDuty. Approval gates as a safety net let a human review the agent's proposed action and its reasoning before the action executes.

At AwaitHuman, we provide exactly this layer. Our drop-in approval queues integrate with any MCP workflow via a single webhook. When an agent decides to take a write action, Awaithuman intercepts the request, presents the full reasoning trace to a human through our dashboard or an omnichannel alert, and only executes the action after approval. Check our pricing guide to understand how escalation-as-a-service compares to per-user incident management costs. Our service is free during the beta phase, making this accessible for teams testing production agentic workflows.

How PagerDuty MCP Works Under the Hood

Understanding the mechanism helps you debug issues and design better escalation policies.

JSON-RPC and the PagerDuty REST API

The implementation communicates with AI clients via JSON-RPC over standard I/O or HTTP. The client sends a structured request describing the tool call and parameters.

It sends natural language to the LLM, which selects the appropriate tool, and the implementation handles the translation. This indirection is what makes MCP powerful. It only needs to know what tools the implementation exposes.

OAuth Versus API Token Authentication

The token is automatically refreshed, reducing the risk of credential exposure.

API tokens are simpler but longer-lived. They work well for local development and embedded clients where the token lives in a controlled environment. Never commit API tokens to source control. Use environment variables or a secrets manager.

The Standard Tool Set

The implementation exposes these tools out of the box:

  • list_incidents, query incidents with filters for status, urgency, and team
  • get_incident, fetch full details for a specific incident
  • list_services, list all services and their status
  • list_oncalls, show the current on-call schedule
  • list_escalation_policies, query escalation policy definitions
  • acknowledge_incident, acknowledge an incident
  • add_note, add a note to an existing incident

Each tool includes parameter descriptions that help the LLM understand what values to provide. The quality of these descriptions directly affects how well your agent uses the tools.

Context Preservation in Agentic Workflows

One of MCP's strengths is context preservation. When an agent calls list_incidents, the implementation returns the full incident data, which stays in the LLM's context. The agent can then call get_incident on a specific ID from the previous response, maintaining a coherent conversation thread.

This is valuable for debugging and auditability. You can trace exactly what data the agent saw and what decisions it made based on that data. But the trace lives in the LLM context, not in an external audit log. For compliance-sensitive workflows, you need an additional layer that captures this reasoning chain permanently.

Common Mistakes When Deploying PagerDuty MCP

Teams moving from proof-of-concept to production tend to hit the same issues. Here are the ones that cause the most damage.

Deploying Without Proper OAuth Security

The self-hosted implementation can run with an API token stored in a configuration file. That works for local testing.

Use OAuth for production deployments. The short-lived tokens and refresh flows reduce the blast radius of a credential leak. If you must use API tokens, rotate them frequently and restrict their permissions to the minimum required for the implementation's tool set.

Letting Agents Act Without Approval

The most expensive mistake is enabling write operations without an approval layer. An agent that can acknowledge incidents will eventually acknowledge the wrong incident. An agent that can trigger escalations will eventually escalate a low-severity alert to the wrong team.

The implementation has no built-in approval mechanism. When the agent calls acknowledge_incident, it executes immediately. You must add a human-in-the-loop layer yourself. Our approval queues are designed for exactly this purpose.

Ignoring API Rate Limits

The implementation does not add any rate limiting of its own. An agent in a tight loop, retrying a failed query or processing a batch of incidents, can exhaust your rate limit and block legitimate user traffic from other integrations.

Set up rate limiting on the implementation process or add a proxy layer that queues requests. Test your agent's behavior under load before deploying to production.

Skipping Audit Logging

The implementation produces logs, but these logs are not structured for compliance or incident forensics. When an agent makes a mistake, you need to know exactly what the agent saw, what decision logic it used, and what action it took.

Immutable audit trails capture this information permanently. At AwaitHuman, we record every escalation request with the full LLM reasoning trace and tool logs, so you can reconstruct any incident after the fact.

Adding Controlled Human Oversight to PagerDuty MCP Workflows

The pattern for production-grade deployments is straightforward: let the agent use the implementation for awareness and routine actions, but require human approval for critical operations.

Approval Queues as a Safety Net

An approval queue sits between your agent and the implementation. When the agent calls a write operation, the queue intercepts the request and routes it to a human for review. The human sees the agent's proposed action, the reasoning trace, and the context that led to the decision. They can approve, reject, or modify the action.

This pattern preserves the speed benefits of agentic workflows for safe operations while adding a safety net for high-risk actions. You configure the trigger conditions based on action type, service, severity, or any other dimension your workflow requires.

Omnichannel Alerts and Full Reasoning Context

The human who reviews an approval request should not need to be watching a dashboard. Omnichannel alerts for AI agents deliver the request to wherever the operator is, via push notification, email, SMS, Telegram, or WhatsApp. Each alert includes enough context for the operator to make a decision without switching tools.

AwaitHuman sends the full reasoning trace with every alert. The operator sees the agent's chain of thought, the specific data that triggered the action, and the proposed response. No context switching, no guesswork.

Immutable Audit Trails

Every approval or rejection creates an immutable record. This record includes the agent's reasoning trace, the human's decision, the time the decision was made, and the final action that executed. For compliance teams, this replaces the "the agent said it did X, but we are not sure" conversation with a clear, verifiable log.

The features for production AI agents that most teams discover too late include audit trails, reasoning preservation, and escalation triggers. Our platform builds these in from day one.

Why Your Agentic Workflow Needs Escalation-as-a-Service

It is excellent infrastructure for connecting LLMs to incident management. But awareness alone does not make a production-ready agentic workflow.

The Cost of Agent Errors at Scale

An agent that incorrectly acknowledges a critical incident delays the response by the time it takes someone to notice and re-trigger the alert. An agent that escalates a minor issue to the wrong on-call team wastes that engineer's focus and erodes trust in the escalation system. These errors compound as your agent handles more incidents.

The cost is operational, but it goes deeper than that. When an agent makes a visible mistake, the humans in your organization start working around it. They ignore agent-based escalations. They double-check every decision manually. The productivity gain you built the agent for evaporates.

Why Escalation-as-a-Service Wins

Escalation-as-a-service means your agent triggers escalation, but the escalation is managed, routed, and logged by a dedicated infrastructure layer. Your agent does not need to implement approval logic, manage notification routing, or store audit records. It calls a webhook, and the infrastructure handles everything else.

This pattern is what we built at AwaitHuman. AwaitHuman is escalation-as-a-service for agentic workflows. Drop-in approval queues, omnichannel operator alerts, full audit trails, and intervention dashboards. For critical actions, they escalate through Awaithuman. The combination gives you the speed of autonomous agents with the safety of human oversight.

Getting Started With AwaitHuman

Our platform is free during the BETA phase. We integrate with any MCP-compatible agent via a single webhook. You add our webhook endpoint as a tool in your agent's tool set, configure your escalation conditions, and deploy.

No additional servers. No complex authentication flows. Two layers, one architecture, production-ready from day one.

We solve the control problem. Together, they give you the agentic workflow your team deserves.

Top comments (0)