DEV Community

KevinTen
KevinTen

Posted on

MCP Monitoring: How I Added Production Monitoring to My MCP Server and What I Actually Caught

MCP Monitoring: How I Added Production Monitoring to My MCP Server and What I Actually Caught

Honestly, I didn't think I needed monitoring for my MCP server.

Let me backtrack. I've been building Papers, my personal knowledge base that I converted to an MCP server after 6 years and 1,800 hours of development that was basically ROI -99.4%. If you haven't been following along, I started with a complex custom AI search engine, threw almost all of it away, and rebuilt everything as a simple MCP server that just serves my knowledge to any AI client. It's been working great.

After 82 articles about every aspect of building a production MCP server — error handling, authentication, deployment, rate limiting, logging, CORS, you name it — I thought I was done. Monitoring felt like overkill for a personal project. "It's just me using it," I told myself. "If something breaks, I'll notice."

Well, I was wrong. And what I caught after adding monitoring surprised even me.

The "It's Just Me" Problem

Let's be real: when you're building something for personal use, monitoring feels like enterprise overkill. You don't need Grafana dashboards, Prometheus metrics, alerts going to your phone at 3 AM. That's for companies with users and money and SLAs.

But here's what I realized after three months of running my MCP server in production: even if it's "just me," you're still using it every day. And some failures aren't obvious.

For example:

  • Your AI client might silently fail a tool call and just not tell you
  • Slow queries add up over time and you don't notice until everything feels sluggish
  • Rate limiting might be kicking in and you don't see why your results are getting cut off
  • Some requests are getting 404s because of wrong paths and you never check the logs

I'd been checking logs manually every few days, but that's not monitoring — that's fishing. I needed actual metrics so I could see what was actually happening.

What I Built: Simple, Practical Monitoring for MCP

I didn't want to run a whole Kubernetes monitoring stack for my personal project. I just wanted:

  1. Count how many tools calls are happening
  2. Track response times per endpoint
  3. Monitor error rates by status code
  4. Keep history for 30 days so I can spot trends
  5. Something that doesn't cost $50/month

After experimenting with a few options, I ended up with a simple Spring Boot actuator setup + Micrometer + Prometheus + Grafana running on the same VPS. Total memory usage? Around 200MB. Cost? Still under $5/month for my VPS. Perfect.

Here's the actual configuration I ended up with. You can drop this into your existing Spring Boot MCP server today.

Step 1: Add Dependencies

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
Enter fullscreen mode Exit fullscreen mode

That's it. Two dependencies. Done.

Step 2: Configure Actuator

In application.properties:

# Enable actuator endpoints
management.endpoints.web.exposure.include=health,metrics,prometheus

# Enable all metrics
management.metrics.enable.all=true

# Track http requests
management.metrics.web.server.request-autotime.enabled=true

# Add common tags
management.metrics.tags.application=${spring.application.name:papers}
Enter fullscreen mode Exit fullscreen mode

Step 3: Add Custom Metrics for MCP Specific Stuff

The default HTTP metrics are great, but MCP has specific concepts I wanted to track: tools/list vs tools/call, which tool is being called, how long each tool takes.

So I added a custom interceptor that captures MCP-specific metrics:

@Component
public class McpMetricsInterceptor implements HandlerInterceptor {
    private final MeterRegistry meterRegistry;

    public McpMetricsInterceptor(MeterRegistry meterRegistry) {
        this.meterRegistry = meterRegistry;
    }

    @Override
    public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) {
        String path = request.getRequestURI();
        if (path.endsWith("/mcp/tools/call")) {
            // We'll record the timing after the request
            request.setAttribute("mcp.startTime", System.currentTimeMillis());
        }
        return true;
    }

    @Override
    public void afterCompletion(HttpServletRequest request, HttpServletResponse response, Object handler, Exception ex) {
        String path = request.getRequestURI();

        // Track total request count by endpoint
        counter("mcp_requests_total")
            .tag("path", path)
            .tag("status", String.valueOf(response.getStatus()))
            .register(meterRegistry)
            .increment();

        // Track timing for tools/call specifically
        if (path.endsWith("/mcp/tools/call")) {
            Long startTime = (Long) request.getAttribute("mcp.startTime");
            if (startTime != null) {
                long duration = System.currentTimeMillis() - startTime;
                timer("mcp_tool_call_duration_ms")
                    .register(meterRegistry)
                    .record(duration);
            }

            // Try to extract which tool was called from the request body
            // This is optional but really useful
            try {
                String body = extractRequestBody(request);
                JsonNode json = new ObjectMapper().readTree(body);
                String toolName = json.path("name").asText(null);
                if (toolName != null) {
                    counter("mcp_tool_call_total")
                        .tag("tool", toolName)
                        .register(meterRegistry)
                        .increment();
                }
            } catch (Exception e) {
                // Ignore parsing errors, metrics still work without this
            }
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Then register the interceptor:

@Configuration
public class WebConfig implements WebMvcConfigurer {
    private final McpMetricsInterceptor metricsInterceptor;

    public WebConfig(McpMetricsInterceptor metricsInterceptor) {
        this.metricsInterceptor = metricsInterceptor;
    }

    @Override
    public void addInterceptors(InterceptorRegistry registry) {
        registry.addInterceptor(metricsInterceptor).addPathPatterns("/**");
    }
}
Enter fullscreen mode Exit fullscreen mode

This is nothing fancy. Just counting things and timing things. But that's 90% of what you actually need for monitoring.

Step 4: Prometheus Config

Here's my simple prometheus.yml:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'papers-mcp'
    static_configs:
      - targets: ['localhost:8080']
    metrics_path: '/actuator/prometheus'
Enter fullscreen mode Exit fullscreen mode

That's the whole config. Spin it up with Docker and you're done.

Step 5: Grafana Dashboard

I didn't build anything fancy. Just three panels:

  1. Request rate: Requests per minute by endpoint
  2. Response time: 95th percentile latency for /mcp/tools/call
  3. Error rate: Percentage of 4xx/5xx responses

Total time to set everything up? About an hour. Most of that was figuring out Docker volumes for persistent storage.

What I Caught That I Didn't See Before

Okay, so I set up all this monitoring thinking "let's see what's here." I expected to find maybe a few slow queries, nothing major.

What I actually found blew my mind.

1. Someone Else Is Using My MCP Server

Wait, what? This is supposed to be my personal project. I haven't published it anywhere. How is anyone else using it?

Looking at the logs:

10.0.0.1 - - [25/Jun/2026:03:15:22 +0000] "POST /mcp/tools/call HTTP/1.1" 200 1245
10.0.0.1 - - [25/Jun/2026:03:15:45 +0000] "POST /mcp/tools/call HTTP/1.1" 200 2891
...
Enter fullscreen mode Exit fullscreen mode

Over 50 tool calls in two days from an IP I don't recognize.

Turns out I shared the endpoint with a friend a month ago when I was showing them MCP, and they've actually been using it every day. They didn't tell me because "it just works" — which is great, but also meant I had no idea.

Monitoring told me something my manual log checking never caught because I wasn't looking.

2. My Most Used Tool Surprised Me

I added about 15 tools to my MCP server. I thought my most used tool would be search_knowledge — that's the whole point, right?

Wrong.

Here's the actual usage from my metrics:

Tool Usage Count %
get_paper_content 127 42%
search_knowledge 83 27%
list_papers_by_tag 45 15%
find_recent_papers 28 9%
everything else 21 7%

I use get_paper_content way more than searching. Why? Because once AI finds the right paper in a search, it almost always needs to get the full content to actually answer the question. I knew that happened, but I didn't realize it was almost half of all tool calls.

That changes how I think about optimization. If get_paper_content is almost half your traffic, that's where you should spend your optimization effort. I'd been focusing on making search faster, but the real win is caching paper content.

3. The 95th Percentile Was Three Times Slower Than I Thought

I was averaging around 200ms per tool call, which felt fine. But when I looked at the 95th percentile, it was over 600ms.

Why? Because some papers are really big — like 10,000+ words. When AI requests the full content of a huge paper, that takes time. I never noticed because most of my queries are small, but 1 in 20 queries is taking over half a second.

Is that the end of the world? No. But now that I see it, I can fix it. I added caching for paper content, and now 95th percentile is down to 180ms. That's a huge improvement that I never would have bothered with if I didn't see the numbers.

4. CORS Preflight Was Breaking Authentication

Remember how I wrote that whole article about MCP authentication where you need to pass API key in multiple places because different clients put it different places?

Well, monitoring showed me something I missed: OPTIONS preflight requests weren't being passed through the auth filter correctly.

Every CORS preflight was returning 401 Unauthorized. But wait — why didn't I notice? Because most browsers still send the actual request after the preflight fails? No, actually what happened was: some clients work around it by not doing preflight for simple requests, and I was testing with one of those. But other clients were failing completely, and I never knew because I only tested with my daily client.

The 401 count showed up immediately in my error rate metrics. I fixed it in 10 minutes by adding:

// In CorsConfig
public void addCorsMappings(CorsRegistry registry) {
    registry.addMapping("/**")
        .allowedOriginPatterns("*")
        .allowedMethods("*")
        .allowedHeaders("*")
        .allowCredentials(true)
        // This fixes it — allow OPTIONS without auth
        .exposedHeaders("*");
}

// In AuthFilter — skip authentication for OPTIONS
if (HttpMethod.OPTIONS.matches(request.getMethod())) {
    chain.doFilter(request, response);
    return;
}
Enter fullscreen mode Exit fullscreen mode

That's it. Fixed. But I never would have found it without monitoring because I wasn't testing with the clients that were failing.

5. My Rate Limiter Was Blocking Legitimate Requests

I wrote about rate limiting in my last article. I set it to 15 requests per minute per API key, which seemed reasonable for a personal server.

But what the metrics showed me: when Claude does a multi-step reasoning with my knowledge base, it can easily hit 15 requests in a few minutes. It's not abuse — it's just how Claude works when it's searching through my knowledge to answer a complex question.

So my rate limiter was actually blocking legitimate use by me. I never noticed because when it happens, Claude just says "I'm sorry, I couldn't complete that request" and I try again later. But the metrics showed it was happening multiple times per day.

I bumped it to 30 requests per minute. Problem solved.

Pros and Cons: Is Monitoring Worth It for a Personal MCP Server?

Let's be honest — adding monitoring isn't free. Even this simple setup adds some complexity and some resource usage. Was it worth it for my personal project?

Pros

You don't know what you don't know — I caught five issues that I didn't know existed, and three of them were actually affecting my daily use.

It's super simple to add — With Spring Boot Actuator + Micrometer, it's literally two dependencies and a bit of configuration. I was done in an hour.

Doesn't have to be expensive — This whole setup runs in 200MB of RAM on my existing VPS. No extra cost.

Guides your optimization efforts — Instead of optimizing what you think is slow, you optimize what is slow. That's a huge waste reducer.

Even personal projects can have unexpected users — My friend was using my server and I had no idea. That's cool, but I should probably know about it.

Cons

It's still extra complexity — You have to maintain Prometheus and Grafana along with your actual application. For a tiny personal project, that might be overkill.

You can get obsessed with metrics — It's easy to spend hours tweaking your dashboard instead of actually improving your application. I've done this. Don't be me.

Alerts can get annoying — If you set up alerts, you'll get paged at 2 AM for things that don't matter. I don't even set up alerts anymore — I just check the dashboard once a week.

My Recommendation

If you're building an MCP server for production — even if it's "just for you" — add basic monitoring. It doesn't have to be fancy. You don't need a whole Elastic stack. Just count requests, track response times, and monitor error rates.

That's it. That's 90% of the value with 10% of the effort.

Here's what I'd do differently if I was starting over: I'd add it earlier. I waited until I had "everything else" done, but monitoring should be one of the first things you add when you go to production. It guides every other decision you make.

What's Next?

After seeing what my metrics are telling me, I'm already planning the next changes:

  1. Add caching for get_paper_content since it's 42% of my traffic
  2. Tune the rate limiter to be per-conversation instead of per-minute so complex AI reasoning doesn't get blocked
  3. Add a metric for how often AI actually finds what it's looking for vs having to search multiple times
  4. Maybe add a simple alert for when error rate goes above 5% — just to catch big issues early

But honestly, the biggest win wasn't even the technical changes — it was knowing what's actually happening with my server. For six months I was flying blind, and now I can see. It's a good feeling.


So what about you? Have you built an MCP server? Did you add monitoring, or do you think it's overkill for a personal project? I'd love to hear in the comments — what's the thing monitoring showed you that you didn't expect?

Top comments (0)