Preetha

Posted on May 27

Inside an MCP-Native Content Workflow Engine — Here's What Actually Broke (and What Finally Made Sense)

#mcp #contentops #aiops #fastapi

I started this project thinking, "Let me try this MCP thing." I didn't expect to end up rebuilding how I think about workflow automation, AI integration, and what it really means to build infrastructure instead of just tools.

This isn't a tutorial — the repo ContentOps-MCP has a 7-day structured tutorial for that. This is more honest. It's about the decisions I made, the ones that broke things, and the lessons that actually matter for your next project.

Think of it like a detective story:

each phase of the build was a clue. Some led to dead ends. Some changed the whole direction. By the end, the architecture finally made sense.

The problem I was actually trying to solve

Content teams don't waste time writing. They waste time on the plumbing around the writing.

Checking if the meta description is there.
Pasting the draft link into Slack.
Remembering to send the newsletter email.
Discovering two hours after publishing that paragraph three has a broken link.

Most teams patch this with Zapier or n8n. Notion → WordPress → Slack. Done. It works — until it doesn't.

Until someone publishes a half-finished draft because the automation didn't care.

Until the SEO title is 90 characters because nothing checked.

Until the brand voice that took six months to establish gets eroded one auto-published post at a time.

The automation wasn't the problem. What it lacked was any sense of quality. It just moved things around.

So I started with one question:

What if the workflow itself could check whether the content was ready before it moved?

What I built

The project is called ContentOps MCP Orchestrator. The short version looks like this:

Notion draft → QA gate → WordPress → Slack → email

But the interesting part isn't the pipeline. It's the QA gate sitting in the middle of it, and the architecture underneath that makes the whole thing composable.

Phase 1: get the pipeline working

I started with a straightforward FastAPI app. Notion polling, a workflow runner that called step functions in order, SQLite for run history, and a basic static UI. Nothing fancy.

The point was to get the pipeline working end-to-end and understand what data needed to flow between steps.

Phase 2: make everything MCP-first

Next, I refactored every integration — WordPress, Slack, Resend, Notion — into its own MCP server. Each one exposed /tools and /use_tool endpoints. The orchestrator became an MCP client, dispatching tool calls instead of importing Python functions directly.

I added a ServerRegistry that tried the remote server first and fell back to local mock responses if it wasn't running.

Suddenly the system felt different. Adding a new integration didn't mean editing the runner. It meant writing a new server.

Phase 3: add the QA gate and registry

Then came the QA gate. It's an 11-agent local scoring engine that runs on every workflow before the publish step. It checks:

SEO title completeness
Meta description
Broken links
Readability
Brand voice consistency (against a rubric you define)

It returns a score, a pass/fail, and specific suggestions. The workflow either pauses for human review or continues automatically depending on the mode you configure.

I also added a curated registry of 14 content-stack MCP servers (Ghost, Beehiiv, Substack, WordPress, Webflow, Resend, Loops, Mailchimp, Linear, Notion, Coda, and more).

The architecture decision that changed everything

Here's the thing about building a workflow tool. The naive approach is to write a runner that knows how to call each integration:

if step.app == "wordpress":
    call_wordpress()
elif step.app == "slack":
    call_slack()

It works. It's also a dead end, because every new integration means touching the runner.

The MCP model inverts this. The runner doesn't know anything about WordPress. The WordPress MCP server knows about WordPress. The runner just knows how to make a tool call and handle the result.

You can add Ghost, Beehiiv, Linear, anything — as long as it speaks the protocol, the runner doesn't care.

This sounds obvious in retrospect. It wasn't obvious when I was writing step 1.

🔍 Example: MCP tool call vs HTTP call

This is the turning point where the architecture finally clicked.

❌ Old way (HTTP / Python function):

if step.app == "wordpress":
    create_draft(step.title, step.content)
elif step.app == "slack":
    post_message(step.text)

Every new integration means touching this file. Every new step type means more branching logic. Hard to test. Hard to extend.

✅ New way (MCP tool call):

steps:
  - server: wordpress-mcp
    tool: create_draft
    input_map:
      title: "{trigger.pages[0].title}"
      content: "{trigger.pages[0].body}"
  - server: slack-mcp
    tool: post_message
    input_map:
      text: "New draft published: {steps[0].url}"

The orchestrator doesn't know what WordPress is. It just makes a tool call:

{
  "server": "wordpress-mcp",
  "tool": "create_draft",
  "params": {
    "title": "How to Build an MCP-Native Content Stack",
    "content": "..."
  }
}

Add Ghost? Add another server. Add Beehiiv? Add another server. The orchestrator stays the same. That's the inversion.

Building the QA gate: what I got wrong first

My first attempt at the QA gate was a single function that checked everything. One blob of logic, one result. It was fast to write and immediately painful to extend.

The problem is that content quality isn't one thing. SEO is a different concern from readability. Brand voice is different from link validation. Mixing them into one function meant changing one thing risked breaking another, and the output was a wall of undifferentiated feedback that didn't tell you what to fix first.

The second version split it into 11 specialized agents. Each agent inherits from BaseQAAgent, implements a check() method, and returns a QAResult with a severity level and specific issues. The scoring engine aggregates them with configurable weights.

You can add a new agent without touching any existing one.

This is the open/closed principle, yes. But the deeper lesson was about feedback granularity.

A QA check that says "this draft needs work" is useless.

A check that says "your meta description is missing, your third internal link returns 404, and your sentence length average is 28 words for a technical audience" is actionable.

🔍 Example: QA gate — before vs after

This is where the QA gate actually does something useful.

Imagine this draft comes in from Notion:

title: "My Draft Post"
meta_description: ""
content: "Testing something. Not sure if this will work."

The QA gate checks:

Title: too vague → fails
Meta: missing → fails
Readability: too shallow for technical audience → fails
Brand voice: informal, unclear → fails

Result:

{
  "pass": false,
  "reasoning": "Title is generic and vague, meta description is missing, brand voice is not professional or technical, and readability is too shallow for the target audience.",
  "suggestions": [
    "Choose a more specific, technical title.",
    "Write a clear meta description summarizing the article.",
    "Add concrete examples and structure your content with sections."
  ]
}

The workflow pauses. The editor sees this reasoning and chooses to "retry after edit" instead of publishing a weak draft.

That's the difference between "automation moved things" and "automation checked quality first."

The fallback pattern: underrated engineering

One of the quieter decisions in this project was the ServerRegistry fallback. When the orchestrator needs to call wordpress-mcp::create_draft, it first tries the actual MCP server running on port 8002. If that's not running, it falls back to a local mock that returns a realistic response shape.

This sounds like a convenience feature for demos. It's actually a correctness feature for development.

Without fallback, developing the orchestrator logic requires all four MCP servers to be running simultaneously. With fallback, you can work on the executor, the QA gate, the UI, and the run trace logic independently — the system behaves consistently regardless of which servers are actually up.

The lesson:

Design your system to be testable in parts, not just as a whole.

The fallback pattern is one way to do that. It forces you to define what a "realistic response" looks like for each tool, which in turn forces you to think clearly about your data contracts. The mock becomes a kind of documentation, because it defines the shape of the real response.

🔍 Example: Fallback mock vs real server

This is why the fallback pattern is underrated.

Real server response (if running):

{
  "server": "wordpress-mcp",
  "tool": "create_draft",
  "result": {
    "slug": "mcp-native-content-stack",
    "url": "https://blog.example.com/mcp-native-content-stack",
    "status": "draft"
  }
}

Fallback mock (if server is not running):

{
  "server": "wordpress-mcp",
  "tool": "create_draft",
  "result": {
    "slug": "mcp-native-content-stack-mock",
    "url": "https://blog.example.com/mock-draft",
    "status": "draft"
  }
}

The orchestrator doesn't care which one it gets. The shape is the same. That means:

You can test the QA gate without WordPress running.
You can test the UI without Slack running.
You can test the run trace without any servers running.

The mock becomes documentation because it forces you to define what the real response should look like.

What MCP actually is (and why it matters right now)

If you haven't been following the MCP ecosystem, here's the quick version.

Model Context Protocol (MCP) is an open protocol by Anthropic that standardizes how AI models interact with external tools and data sources. Instead of every LLM integration being a bespoke API adapter, MCP gives you a common interface:

Tools with schemas
Resources with URIs
A client/server model that works across implementations

The practical implication for builders:

You can write a tool once and have any MCP-compatible client call it.

Claude desktop, your custom orchestrator, someone else's agent — they all speak the same protocol. This is a big deal for infrastructure builders because it means your MCP server has leverage beyond your own project.

I built contentops-mcp before the MCP ecosystem fully matured, which meant making some decisions based on where I thought the protocol was going rather than where it was. That's a risk. But it's also an opportunity.

Being early in an ecosystem means the registry of content-stack MCP servers I built has a real chance of becoming the reference catalog for this niche, simply because almost nothing else exists there yet.

Lessons worth keeping

These are the ones I'd tell myself on day one if I could.

1. Build the trace first.

I added detailed run tracing — per-step status, input/output capture, timestamps — later than I should have. Every bug I hit before that required reading logs and mentally reconstructing what happened. After I had the trace, debugging became obvious. Build observability into the data model from the start, not as an afterthought.

2. Real numbers beat descriptions.

"11-agent quality checks" and "14-server content-stack registry" are specific. "AI-powered quality checks" and "growing registry" are not. Specificity signals that you actually built the thing and counted it. Use real numbers everywhere you can.

3. The mock is a contract.

Every time I wrote a fallback mock response, I was forced to decide what the real response would look like. That decision often uncovered inconsistencies in my data model before I'd written the real implementation. Write your mocks early and treat them as interface definitions.

4. One entry point, one command.

The project went through a phase where running it required four terminal windows — one for each MCP server, one for the orchestrator. That's fine for production but terrible for a learner's first experience. The ServerRegistry fallback pattern collapsed it to one command. Always think about what the first five minutes look like for someone who just cloned the repo.

5. Separate the learning path from the feature list.

The 7-day tutorial structure forced me to think about which concepts depended on which other concepts, and in what order they should be introduced. That exercise — independently of any tutorial — made the architecture cleaner. If you can't explain the build order to a beginner, your dependencies probably aren't as clean as you think.

6. Don't wrap, invert.

The temptation when building on top of existing integrations is to wrap them. "I'll write a wrapper for the WordPress API." The better question is: who should know about WordPress? The answer is: the WordPress MCP server, and nothing else. Inversion of knowledge — pushing integration-specific logic to the edge — is what makes the core stay clean.

7. Naming is architecture.

qa-gate::run_check as a tool call address tells you the server, the tool, and the action in one string. It's readable in logs. It's greppable. It matches your file structure. Naming your MCP tool calls with server::tool convention costs nothing and saves enormous cognitive load when you are reading a run trace at midnight trying to figure out why step 3 failed.

What's next

The project has a roadmap that goes in a direction I am genuinely excited about.

Phase 4: visual workflow editor and team-based approval gates — the product layer on top of the infrastructure.
Phase 5: hosted registry with premium MCP server adapters for enterprise stacks.

But the thing I am most interested in is the registry itself. There is no curated, tested, versioned catalog of content-stack MCP servers anywhere in the current ecosystem. The one in this project is a start — 14 servers across publishing, email, and ops categories.

Getting that to 50, with real installation paths and verified tool schemas, is the kind of catalog that becomes a reference point for the whole content-ops vertical.

Where to find it

The project is on GitHub at ContentOps-MCP with a zero-to-running quick start, a 7-day tutorial, and full docs for the QA gate and registry.

The one-line pitch:

Zero Infra Cost — MCP-native ContentOps orchestrator for AI publishing workflows.

Notion → QA Gate → WordPress → Slack → Resend.

Built-in 11-agent quality checks, 14-server content-stack registry. Self-hosted.

If you are building in the MCP ecosystem, working on content tooling, or just want a concrete project to learn FastAPI and workflow orchestration from scratch — the tutorial is designed to take you from day one to a working demo in a week.

I didn't want to build another automation tool. I wanted to build a content system that could reason about quality before it published anything.

The case isn't closed. If you are working on anything in the MCP space, leave a breadcrumb in the comments — I want to know what tools people are wiring together right now before the trail goes cold.

DEV Community