Arshdeep Singh

Posted on Jan 11

Automating Performance Engineering with Claude Code and New Relic MCP

#newrelic #mcp #drupal #claudecode

For a long time, my “performance engineering workflow” as a Tech Lead looked like this:

Log into New Relic
Run a handful of NRQL queries
Inspect slow transactions and error traces
Map issues back to Drupal code
Estimate effort and impact
Create JIRA tickets with enough context
Post updates in Teams

It’s valuable work, but it is also repetitive, mechanical, and very interrupt‑friendly. It was quietly costing me 2-3 hours every week.

So I automated it.

This post walks through the workflow I built using Claude Code, the New Relic MCP, Jira, and Microsoft Teams to:

Continuously analyze performance and error data
Generate structured root cause analysis with code references
Create and prioritize Jira tickets
Notify the team with severity‑specific alerts
Optionally draft pull requests for straightforward fixes

Why This Was Worth Automating

As a Tech Lead on a large Drupal platform, my time is best spent on:

Architecture and design decisions
Reviewing high‑impact changes
Mentoring and unblocking engineers
Shaping priorities with product and leadership

But performance issues don’t care about calendars.

Every incident or regression forced me back into the same manual loop: query New Relic, decipher traces, reverse‑engineer root causes, and turn them into actionable tickets. It was important, but it wasn’t leverage.

The workflow in this post exists to do one thing: remove the mechanical part of performance engineering while keeping the judgment and risk decisions in human hands.

High‑Level Architecture

At a high level, the system looks like this:

New Relic collects APM metrics, traces, and error data from our Drupal application.
Claude Code (via the New Relic MCP) pulls that data, analyzes it, and decides what’s worth acting on.
Jira receives structured issues with metrics, root causes, effort estimates, and links back to New Relic.
Microsoft Teams gets severity‑color‑coded notifications so the right people see the right issues at the right time.
GitHub (optionally) receives draft pull requests for straightforward fixes the AI can safely propose.

See full resolution image here. This architecture keeps responsibilities clear:

New Relic is the source of truth.
AI is used for interpretation and orchestration.
Jira and Teams are where work and communication actually happen.
Humans stay firmly in the decision loop.

The Workflow End‑to‑End

Phase 1: Data Collection from New Relic

On demand (or via a scheduled script), the workflow starts with a single instruction, e.g.:

“Execute New Relic performance analysis for production last 1 hour”

Behind the scenes, Claude Code uses the New Relic MCP to run a focused set of NRQL queries:

Slow transactions: endpoints with response time above a threshold
Error rates: exception types, messages, and affected routes
Database performance: slow queries and N+1‑style patterns
Open incidents and application health: alerts, Apdex, throughput

The goal here is not to recreate the entire dashboard - it’s to pull just enough data for a useful decision.

Phase 2: AI‑Powered Root Cause Analysis

Claude Code then takes that raw telemetry and turns it into something my team can act on:

Groups slow transactions into meaningful units (e.g., /reports/latest, specific admin pages)
Connects issues to Drupal modules, controllers, or custom code paths
Distinguishes between:
- one‑off spikes vs. consistent degradation,
- user‑facing vs. admin‑only issues,
- backend jobs vs. interactive requests
Hypothesizes root causes:
- N+1 queries
- missing cache tags/contexts
- heavy external API calls
- misconfigured database access patterns

The important part: this analysis is always explainable. If the suggestion is wrong, it is wrong in a way that is obvious when you read the ticket.

Phase 3: Priority, Impact, and Story Points

Performance work always competes with feature work, so the system needs to express impact in a language the team understands.

The workflow classifies each issue as Critical, High, or Medium based on thresholds like:

Response time ranges
Error rate percentages
Whether the issue is user‑visible
Whether functionality is partially or fully degraded

From there, it estimates story points (1/2/3/5/8) using a simple heuristic:

Complexity (single query tweak vs. cross‑module change)
Scope (one endpoint vs. a subsystem)
Risk (low‑risk cache change vs. behavior‑changing refactor)
Effort (hours vs days)

These are not perfect, but they are consistent – which is often more useful than “perfect but ad hoc”.

Phase 4: Jira Ticket Generation

For every actionable issue, the workflow calls the Jira API to create a ticket in the current active sprint.

Each ticket includes:

A descriptive title (e.g. [Performance] /reports/latest endpoint – 650% response time increase)
A summary of the issue and affected environment
Metrics:
- average and p95 response time
- error rate
- timeframe and estimated impact
Root cause analysis in plain language
Affected code (modules, file paths, and line numbers where possible)
Suggested fix (cache changes, query optimizations, config sync, etc.)
Deep links back to the relevant New Relic views

This is the difference between “we should look into that spike” and “here is an actionable story ready for a sprint board”.

Phase 5: Teams Notifications and Optional PRs

Once tickets are created, the workflow posts an Adaptive Card to the right Teams channel:

Every card includes:

Short description of the issue
Key metrics (response time, error rate, environment)
Link to the Jira ticket (and through that, to New Relic)
Story points and priority

For certain well‑scoped cases (like adding cache metadata or adjusting a specific query), there is also an optional step:

“Yes, attempt to fix this issue”

When I explicitly opt in, Claude Code will:

Read the relevant files
Propose a code change
Run local commands/tests where available
Open a draft PR linked to the Jira ticket

Nothing merges automatically. Manual review and CI are still required.

Guardrails and Risk Management

The part that made this usable in a real production environment was not “more automation”; it was more guardrails.

Some of the key ones:

No confidential data in prompts or artifacts
- No Jira IDs in public logs or documentation
- No customer identifiers or business metrics
- No secrets, URLs, or internal hostnames
New Relic as source of truth
- The AI analyzes existing metrics; it does not invent data
Human‑in‑the‑loop by design
- Every ticket is reviewed before the team sees it
- PRs are suggestions, not actions
Opinionated thresholds
- Hard lines for Critical / High / Medium to avoid alert fatigue
- The system prefers fewer, higher‑quality tickets to a noisy firehose

These are boring details, but they are also the difference between “cool demo” and “thing we actually trust”.

ROI: What This Changed in Practice

In terms of time:

Manual performance checks and ticket creation went from ~2–3 hours/week down to about 15 minutes of review.
Incident response is faster because there is less friction between “we saw something weird in New Relic” and “there is a ticket with a clear owner and plan”.

In terms of quality:

We miss fewer issues, especially slow degradations.
Tickets come with better context and suggested fixes.
Standups focus more on trade‑offs (“Do we take this now or next sprint?”) and less on “What exactly is going on?”.

And in terms of team dynamics:

Developers can pick up performance work without having to live in New Relic.
The platform feels more “observed” without feeling more “surveilled”.

Implementation Notes

At a high level, this stack uses:

Application: Drupal on Acquia
Monitoring: New Relic APM, NRQL queries, incidents
AI: Claude Code with the New Relic MCP
Ticketing: Jira Cloud (REST API v3)
Communication: Microsoft Teams (incoming webhooks)
Version Control: GitHub (for optional PR automation)

From a configuration perspective, the main work is:

Wiring the New Relic account and NRQL queries into the MCP
Wiring Jira, Teams, and GitHub credentials through environment variables
Defining thresholds and ticket templates that reflect your actual workflow

The exact code in my setup is specific to our environment and not open‑sourced (yet), but the pattern is portable to any stack where you have:

Structured telemetry
A programmable AI agent
A ticketing system
A chat/notification channel

Credits

This workflow only exists because of the people around me.

Shannon Lal, our CTO, pushed us to adopt Claude Code and gave me the runway to experiment with this in a real system.
Maria Parra Pino and Ruslana Zagrai, two of the developers on the team, were the ones stress‑testing the idea, calling out edge cases, and helping refine it into something that is actually usable day‑to‑day.

And of course, thanks to New Relic for building a platform and MCP integration that made it possible to treat APM data as something to automate against, not just stare at.

When This Pattern Makes Sense

In my experience, this kind of automation works well when:

The workflow is repeatable and well‑understood.
The inputs are observable and reliable (like APM telemetry).
The outputs can be expressed as structured work (tickets, PRs, notifications).
You are willing to keep humans in the loop.

If you’re running New Relic and Jira already, the leap from “manual checks” to “AI‑assisted performance engineering” is more about design and guardrails than about exotic technology.

If you end up building a variant of this, I’d genuinely love to hear what worked, what broke, and what you did differently.

DEV Community