DEV Community

Immanuel Gabriel
Immanuel Gabriel

Posted on

The part after shipping an MCP server: making it fail honestly

I thought the hard part was shipping the MCP server.

It was not.

The hard part started after the server was public, listed, installable, and testable outside my own machine.

FreshContext is my MCP project around freshness-aware retrieval. The basic idea is simple: agents should not treat stale information as current truth. Every retrieved signal gets wrapped with source, timestamp, retrieval time, and confidence metadata, then ranked with freshness decay before it reaches the model.

That was the first public story.

But once the site, npm package, MCP Registry entry, Agentic Market listing, and feed APIs were live, a different kind of work became more important:

Making the system fail honestly.

The bug that bothered me most

A read-only QA pass found several issues.

Some were normal integration problems:

  • one finance path was failing against Yahoo with a 401
  • the Hacker News tool rejected plain text even though the product copy implied search-like behavior
  • one HN scrape path produced a malformed timestamp
  • the local stdio package reported an old internal version

Annoying, but fixable.

The serious issue was different.

A failed finance call could still be wrapped as high-confidence output.

That is poisonous for a product about freshness and retrieval trust.

If a tool fails, it should not come back dressed as:

Confidence: high
Score: 100/100
Enter fullscreen mode Exit fullscreen mode

That is worse than an obvious crash. A crash tells you something broke. A confident bad envelope quietly teaches the user to distrust the whole system.

So the next release was not about adding features.

It was about making failures honest.

The 0.3.17 release

The 0.3.17 patch focused only on validated problems.

No new verticals. No redesign. No “while we are here” expansion.

The finance adapter moved away from the broken Yahoo path and now uses Stooq as a no-key quote source.

Failure-looking adapter output is blocked from becoming high-confidence FreshContext output.

The Hacker News tool now accepts both URLs and plain text queries while keeping the same field name for compatibility.

HN dates are normalized before freshness scoring.

The Worker version, npm package, stdio server info, and MCP Registry metadata were aligned to 0.3.17.

I also added a small stdio smoke test so the package can prove the basics before release:

  • initialize works
  • server version matches package version
  • tools/list returns 21 tools
  • key tools exist

None of this is glamorous.

It is the maintenance layer that makes the public product feel less fragile.

The release gate became boring on purpose

The release flow became:

build locally
smoke stdio
check Worker TypeScript
deploy Worker
smoke live MCP endpoint
verify npm registry install
update public metadata
Enter fullscreen mode Exit fullscreen mode

That order matters.

I did not want to publish npm and then discover that the deployed Worker path behaved differently. I also did not want the registry, npm package, and website telling three different stories.

The rule became:

Do not publish or update public surfaces until the live path works.
Enter fullscreen mode Exit fullscreen mode

This is obvious in hindsight.

It is also exactly the kind of obvious thing that gets skipped when you are moving fast.

Why I built Ops Pulse

Around the same time, I built a small tool called FreshContext Ops Pulse.

It is not a dashboard. It is not a giant observability product. It is a local-first TypeScript CLI that reads Cloudflare metrics and turns them into plain-English diagnosis.

FreshContext runs across Cloudflare Workers, D1, cron, and a few small feed services.

That creates a practical debugging problem.

If the Fresh HN Feed is healthy, I should not touch it while debugging the MCP worker.

If the Jobs Feed is idle because no cron was expected in that window, that is not an outage.

If the MCP core has runtime exceptions, I need to focus there instead of randomly changing healthy systems.

Ops Pulse gives me that separation.

The pipeline is simple:

Cloudflare metrics -> normalized snapshot -> diagnosis -> recommendation
Enter fullscreen mode Exit fullscreen mode

That helped shift the project from “poke around until something changes” to “debug from evidence.”

That is a major relief when you are solo-building and every fix can accidentally create two more bugs.

The real lesson

The biggest lesson was not about MCP specifically.

It was about consistency.

If the website says 21 tools, the registry should say 21 tools.

If npm says 0.3.17, the Worker should report 0.3.17.

If a source has a timestamp, normalize it before scoring.

If a tool fails, make the failure visible.

If one service is healthy, do not disturb it while fixing another service.

This sounds boring because it is boring.

But boring is what makes a tool usable.

A lot of early product work feels like building features. Then the product becomes public and the work changes. Suddenly the important thing is not “can I add one more thing?”

It is:

Can I trust the thing that already shipped?
Enter fullscreen mode Exit fullscreen mode

For FreshContext, that meant stopping feature work for a moment and stabilizing the public surface:

  • MCP Worker live
  • npm package verified
  • MCP Registry updated
  • Agentic Market listing accepted
  • Fresh HN Feed live
  • Fresh Jobs Feed live
  • Apify actors published
  • Ops Pulse available for operational diagnosis

There is still plenty to improve.

But the project is now past the “just an idea” phase. It has install paths, public listings, live feeds, and a small ops loop watching the system.

That feels like a better foundation than another rushed feature.

Links

FreshContext hub:

https://freshcontext-site.pages.dev/

Live demo:

https://freshcontext-mcp.gimmanuel73.workers.dev/demo

MCP endpoint:

https://freshcontext-mcp.gimmanuel73.workers.dev/mcp

GitHub:

https://github.com/PrinceGabriel-lgtm/freshcontext-mcp

Agentic Market:

https://agenticmarket.dev/princegabriel73/server

The next step is not more feature sprawl.

It is better distribution, evidence-led debugging, and watching which surfaces people actually use.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.