DEV Community

Cover image for I Built an MCP Server That Lets AI Autonomously Debug Salesforce - Here's How
Likhit Kumar V P
Likhit Kumar V P

Posted on

I Built an MCP Server That Lets AI Autonomously Debug Salesforce - Here's How

I built sf-log-mcp, an open-source MCP server that gives AI assistants (Claude, Copilot, Cursor) the ability to autonomously fetch, analyze, and manage Salesforce debug logs. It detects "silent failures" that Salesforce marks as "Success" but are actually broken. Published on npm, 9 tools, 7 parsers, 101 tests.

GitHub: github.com/Likhit-Kumar/SF-Logs

npm: npx sf-log-mcp


The Problem Nobody Talks About

If you've ever debugged a Salesforce integration, you know the drill:

  1. Open Setup > Debug Logs

  2. Stare at a wall of logs showing Status = "Success"

  3. Manually download each .log file

  4. Ctrl+F through 50,000 lines looking for what went wrong

  5. Find out the "successful" callout actually returned {"error":"rate_limit_exceeded"} inside an HTTP 200

The Status field lies. In my experience, over 90% of real production issues are silent failures the code didn't crash, Apex didn't throw an unhandled exception, but the right thing didn't happen.

Here's what "Success" actually hides:

  • HTTP 200 with {"error":"rate_limit"} in body - Integration silently failing

  • Exception caught by try-catch - Error swallowed, moved on

  • SOQL returned 0 rows - Wrong filter, no data processed

  • Governor limits at 95% - Works now, breaks at scale

  • Flow path skipped entirely - Expected automation never fired

Now here's the kicker: no existing MCP server can even fetch these logs, let alone analyze them.


The Gap in the Ecosystem

I spent weeks researching the Salesforce MCP landscape. Here's what I found:

Certinia's @certinia/apex-log-mcp

A solid parser, great for performance profiling and bottleneck detection. But 3 out of 4 tools require a local .log file path as input. It cannot list, fetch, or download logs from an org. You still have to manually download them first.

Salesforce's @salesforce/mcp

The official MCP server with 60+ tools across metadata, data, testing, DevOps, and code analysis. Impressive scope. But: zero debug log tools. No list_logs, no fetch_log, no manage_trace_flags. Nothing.

The Workflow Today (Even With MCP)


You: "Something's wrong with our Vendor X integration"

AI: "I'd love to help! Can you download the debug log
and give me the file path?"

You: *opens Setup, clicks Debug Logs, downloads file,
saves to Desktop, types the path*

AI: "Thanks, here's the analysis..."

Enter fullscreen mode Exit fullscreen mode

The AI is supposed to be autonomous. But it can't even get the logs it needs.


What I Built

sf-log-mcp, a standalone MCP server that fills the missing layer between your Salesforce org and your AI assistant.

The Philosophy

Don't filter. Fetch. Read. Reason.

Instead of filtering logs by status (which catches <10% of real issues), sf-log-mcp downloads the raw log content and lets the AI reason about what actually happened such as callout responses, DML results, swallowed exceptions, governor limits, flow paths.

The Workflow Now


You: "Something's off with our Vendor X integration — check the recent logs"

AI autonomously:

1. manage_trace_flags  checks if logging is active
2. fetch_latest_logs  downloads 5 most recent logs
3. analyze_log  health score: 35/100 (CRITICAL)
4. get_log_content  drills into callouts section
5. get_log_content  checks exceptions section

AI: "Found it. Log 07L... shows the callout to
api.vendorx.com/sync returned HTTP 200, but the
response body contains {"error":"rate_limit_exceeded"}.
This happened in 3 of the 5 recent logs. The
integration looks healthy from the Status field
but is actually being rate-limited."

Enter fullscreen mode Exit fullscreen mode

Zero manual steps. The AI fetches, reads, reasons, and explains.


Architecture


You (natural language)
│
v
AI Assistant (Claude / Copilot / Cursor)
│ MCP tool calls via stdio
v
sf-log-mcp (this project)
│ Salesforce Tooling API (REST)
v
Your Salesforce Org (auth via SF CLI)

Enter fullscreen mode Exit fullscreen mode

Key Design Decisions

1. Direct Tooling API, not CLI subprocesses

Instead of shelling out to sf apex list log (which spawns a subprocess, has CLI version dependencies, and limited filtering), I use @salesforce/core to make direct REST calls to the Salesforce Tooling API. This gives fine-grained query control and eliminates subprocess overhead.

2. Reuse SF CLI auth

No new credentials, no OAuth setup, no tokens to configure. If sf org list shows your org, sf-log-mcp can connect to it. It reads from ~/.sf/ - the same auth your Salesforce CLI already uses.

3. Standalone, not bundled

sf-log-mcp runs alongside Certinia's parser, not replacing it. You get the best of both: sf-log-mcp fetches and analyzes for silent failures, Certinia does deep performance profiling. The AI combines both results.


9 Tools, 4 Tiers

Tier 1: Log Acquisition

  • list_debug_logs - List logs with rich filtering (user, operation, date range, size)

  • fetch_debug_log - Download a specific log by ID

  • fetch_latest_logs - Batch-download the N most recent logs

Tier 2: Content Intelligence

  • get_log_content - Extract structured sections (callouts, exceptions, SOQL, DML, governor limits, flows, debug messages)

  • analyze_log - One-call health analysis with a 0-100 score

  • search_logs - Regex search across all downloaded logs

Tier 3: Lifecycle Management

  • manage_trace_flags - Create, list, update, delete trace flags

  • delete_debug_logs - Delete logs (with dry-run mode)

Tier 4: Cross-Log Intelligence

  • compare_logs - Side-by-side diff of two logs for regression detection

The Health Score: Diagnosing Logs in One Call

The analyze_log tool is the entry point for debugging. It returns a health score from 0-100:


Health Score: 65/100 — DEGRADED

Critical Issues:

- Silent callout failure: HTTP 200 with error in body (api.vendorx.com)

Warnings:

- 2 handled exceptions (verify error handling is correct)

- Governor limit: SOQL queries at 82% (approaching limit)

- Zero-row SOQL: SELECT Id FROM Account WHERE ExternalId__c = '...'

Enter fullscreen mode Exit fullscreen mode

How it's calculated:


healthScore = 100

healthScore -= (critical issues × 20)

healthScore -= (warnings × 5)

Enter fullscreen mode Exit fullscreen mode

Health Ratings:

  • HEALTHY (90-100) - No significant issues

  • WARNING (70-89) - Minor concerns worth checking

  • DEGRADED (50-69) - Multiple issues, needs attention

  • CRITICAL (0-49) - Serious failures detected

The AI uses this score to decide what to drill into next callouts? Exceptions? Governor limits? It's the triage step that makes the whole workflow efficient.


Detecting Silent Failures: The 7 Parsers

Each parser is purpose-built to extract and warn about a specific class of silent failure:

1. Callout Parser - The HTTP 200 Lie Detector


CALLOUT_REQUEST|[42]|System.HttpCallout[endpoint=https://api.vendor.com/sync]

CALLOUT_RESPONSE|[42]|System.HttpCallout[status=200, body={"error":"rate_limit_exceeded"}]

Enter fullscreen mode Exit fullscreen mode

Most monitoring checks the HTTP status code. 200 = good, right? Wrong. The callout parser pairs every request with its response and scans the body for error keywords. This catches the most common class of integration failure.

2. Exception Parser - Handled vs. Unhandled


EXCEPTION_THROWN|[15]|System.NullPointerException: Attempt to de-reference a null object

Enter fullscreen mode Exit fullscreen mode

Salesforce only flags unhandled exceptions in the Status field. But most production code wraps everything in try-catch. The exception parser uses a 10-line lookahead : if EXCEPTION_THROWN is followed by FATAL_ERROR, it's unhandled. If followed by METHOD_EXIT, it was caught. Both are reported, because a caught NullPointerException is still a bug.

3. SOQL Parser - The Zero-Row Detector


SOQL_EXECUTE_BEGIN|[23]|SELECT Id FROM Account WHERE ExternalId__c = 'VND-001'

SOQL_EXECUTE_END|[23]|Rows:0

Enter fullscreen mode Exit fullscreen mode

A query that returns 0 rows isn't an error. But if your integration expects to find a matching record and doesn't, the entire downstream process silently does nothing. The SOQL parser flags zero-row results as data issues.

4. Governor Limits Parser - The Time Bomb Detector


Number of SOQL queries: 82 out of 100 (82%) → WARNING

Number of DML rows: 9,800 out of 10,000 (98%) → CRITICAL

Enter fullscreen mode Exit fullscreen mode

At 95% of governor limits, everything works. At 101%, everything breaks. The governor parser calculates percentages and flags anything over 80% as a warning.

5-7. DML, Flow, and Debug Message Parsers

  • DML Parser: Flags bulk operations (>200 rows) that might cause partial failures

  • Flow Parser: Tracks 16 flow event types, flags FLOW_ELEMENT_ERROR and FLOW_ELEMENT_FAULT

  • Debug Messages: Extracts System.debug() output where developers log errors the system doesn't track


Smart Error Handling

Salesforce API errors are notoriously cryptic. sf-log-mcp classifies them into 9 categories with actionable messages:

  • Session expired → "Re-authenticate with: sf org login web --alias <org>"

  • API limit exceeded → "Wait and retry, or check API usage in Setup"

  • Insufficient permissions → "User needs View All Data or Manage Users"

  • Entity already traced → "Use manage_trace_flags to find the existing flag"

No more googling Salesforce error codes.


Security Model

  • No credentials stored - Reuses SF CLI auth from ~/.sf/

  • Org allowlist - --allowed-orgs restricts which orgs the server can access

  • Stdio transport - No HTTP server, no open ports

  • SOQL injection protection - All user inputs are escaped

  • Read-only by default - Only delete_debug_logs and manage_trace_flags modify state (and only debug infrastructure, not business data)


The Numbers

  • 9 MCP Tools
  • 7 Log Parsers
  • 2,488 Source Lines
  • 1,069 Test Lines
  • 101 Tests Passing
  • 15 Test Suites
  • 3 Production Dependencies
  • 44.8 KB npm Package Size
  • 3 Node.js Versions Supported (18, 20, 22)

Try It in 2 Minutes

Prerequisites

  • Node.js >= 18

  • Salesforce CLI (sf) authenticated to an org

Setup


npx sf-log-mcp --allowed-orgs ALLOW_ALL_ORGS

Enter fullscreen mode Exit fullscreen mode

Configure Your AI Client

Claude Desktop:


{

    "mcpServers": {

        "sf-log-mcp": {

            "command": "npx",

            "args": ["-y", "sf-log-mcp", "--allowed-orgs", "ALLOW_ALL_ORGS"]

        }

    }

}

Enter fullscreen mode Exit fullscreen mode

VS Code / Cursor:


{

    "servers": {

        "sf-log-mcp": {

            "command": "npx",

            "args": ["-y", "sf-log-mcp", "--allowed-orgs", "ALLOW_ALL_ORGS"]

        }

    }

}

Enter fullscreen mode Exit fullscreen mode

Then ask your AI: "List my recent Salesforce debug logs"


Multi-Server Setup

sf-log-mcp is designed to complement, not replace:


AI Client (Claude Desktop / VS Code / Cursor)
│
├── sf-log-mcp (this project)
│ Fetch, analyze, search debug logs
│ Detect silent failures
│
├── @certinia/apex-log-mcp (optional)
│ Deep performance profiling
│ CPU bottleneck detection
│
└── @salesforce/mcp (optional)

SOQL queries, metadata, test runs

Enter fullscreen mode Exit fullscreen mode

sf-log-mcp fetches the log and saves it to disk. Certinia's tools read the same file for performance analysis. The AI combines both results -> silent failure detection + performance profiling in one conversation.


What I Learned Building This

1. The Status Field is a Lie

This was the core insight that shaped the entire architecture. Filtering by Status = 'Fatal Error' catches maybe 5-10% of real issues. The rest are silent : HTTP 200s with error bodies, caught exceptions, empty query results, skipped flow paths. The only way to find them is to read the actual log content.

2. MCP is the Right Abstraction

Before MCP, I would have built a CLI tool or a VS Code extension. MCP means I build once and it works everywhere (Claude Desktop, VS Code, Cursor, Windsurf, any future client). The AI decides when and how to use the tools. I just expose the capabilities.

3. Parsers Need to Be Opinionated

A generic parser that returns "here are all the events" is useless to an AI. The parsers need to warn - "this callout returned 200 but the body contains an error keyword." That opinion is what makes the AI's analysis actionable.

4. Health Scores Drive Efficient Debugging

Without the health score, the AI would analyze every section of every log. With it, the AI triages first: "This log is CRITICAL, let me check callouts and exceptions." It cuts the number of tool calls in half.

5. Three Dependencies is Enough

@modelcontextprotocol/sdk for MCP, @salesforce/core for auth + API, zod for validation. That's it. No express, no axios, no lodash. The entire package is 44.8 KB.


What's Next

  • Windsurf testing - Verifying compatibility with the Windsurf AI IDE
  • Real-time log tailing - Stream logs as they're generated (SSE transport)
  • Custom analysis rules - User-defined patterns for domain-specific silent failures
  • Certinia integration guide - Step-by-step workflow combining both servers

Links

If you debug Salesforce integrations, give it a try. If you've been burned by "Status = Success" hiding real failures, you know why this exists.

Star the repo if it helps. PRs welcome.


Tags

#salesforce #mcp #ai #debugging #typescript #opensource #claude #devtools #modelcontextprotocol #apexlogs

Top comments (0)