A prequel to my three-part series on building an MCP server. This post stands on its own — no code, no codebase required. Just the idea that changed how we think about AI integration.
AI Has a Context Problem
Let's start with an uncomfortable truth: the AI you're chatting with right now doesn't know your application.
It doesn't know your database schema. It doesn't know which API version you're running in production. It doesn't know that your team renamed user_id to account_id six months ago, or that your FHIR implementation uses US Core 5.0.1, not 6.1.0, or that the Observation resource in your system carries a custom extension for lab accession numbers.
The AI knows a lot about the world in general. But it knows almost nothing about your world in particular.
And this isn't a failure of AI. It's a failure of plumbing.
The Way We Integrate AI Today Is Backwards
Think about how most teams add AI to their workflow today:
- Copy some context from your app (a schema, a log snippet, an error message).
- Paste it into an AI chat window.
- Hope the AI interprets it correctly.
- Read the response and mentally cross-reference it against reality.
- Repeat.
This is the human-as-middleware pattern. You are the integration layer between the AI and your application. You ferry data back and forth, translate context, and validate every response because the AI has no independent way to check its own answers.
It works. Kind of. But it doesn't scale. And in domains where precision matters — healthcare, finance, infrastructure, compliance — "kind of works" is a liability.
Consider what happens when a developer asks an AI assistant:
"What are the required fields in a FHIR R4 Patient resource?"
The AI might answer from its training data. Maybe it's right. Maybe it's describing R3 fields. Maybe it's mixing in elements from a US Core profile without saying so. Maybe it hallucinated a field that never existed. The developer has no way to tell without opening the specification themselves — which defeats the purpose of asking the AI in the first place.
Now imagine the AI could do this instead:
"Let me look that up."
(callsfhir.get_definitionwith version=R4, kind=StructureDefinition, name=Patient)
"Here's the Patient resource from the R4 specification. The required elements are..."
Same question. But now the answer is grounded in the actual specification, not a statistical approximation of it. The AI didn't guess — it looked it up. Just like you would.
That's what MCP enables.
What Is MCP, Actually?
MCP stands for Model Context Protocol. It's an open protocol — originally developed by Anthropic and now an open standard — that defines how AI models communicate with external tools, data sources, and services.
But that description buries the lead. Here's what MCP actually is in practice:
MCP is a contract between an AI and the systems it can interact with.
That contract has three parts:
1. Tools — "Here are functions you can call"
A tool is a typed function that the AI can invoke. You define the name, the inputs (with types and descriptions), and what it returns. The AI sees this contract and decides when to call the tool during a conversation.
Think of it like giving the AI an API client — but instead of REST endpoints with ambiguous documentation, each tool has a strict schema that the AI can reason about.
Example of what a tool means to the AI:
"I have a tool called fhir.search.
It takes a query string and optional filters.
It returns a list of matching FHIR resources.
I should use this when the user asks about FHIR resources
and I'm not sure of the exact name or want to explore."
The AI isn't reading documentation to figure this out. The tool's name, its input field names, its types — all of that is the documentation. The schema is the interface.
2. Resources — "Here is data you can read"
Resources are read-only data items identified by URIs. Unlike tools (which are actions), resources are data you can look at. The AI can request a resource by URI and get back structured content.
Think of resources as a filesystem the AI can browse:
fhir://R4/StructureDefinition/Patient → the Patient definition
fhir://R5/StructureDefinition/Observation → the Observation definition
uscore://5.0.1/StructureDefinition/us-core-patient → the US Core Patient profile
The AI doesn't need to know where these live on disk or how they're stored. It just requests a URI and gets data back.
3. Prompts — "Here's how to approach a task"
Prompts are reusable templates that guide the AI on how to use tools and present results. They're the "playbook" that says: "When someone asks you to summarize a FHIR profile, here's the approach..."
Prompts are the least understood part of MCP, but they're important. They bridge the gap between raw tool output (structured data) and what the human actually needs (an explanation, a comparison, a recommendation).
Why MCP Matters for Application Development
Here's the argument I want to make: every non-trivial application should eventually expose an MCP interface.
Not because it's trendy. Because the alternative — expecting AI to understand your application from general knowledge — will increasingly become a bottleneck.
Let me make the case through five observations.
Observation 1: AI is already in your team's workflow
Whether you've officially "adopted AI" or not, your developers are using Claude, ChatGPT, Copilot, or Cursor every day. They're asking it about your codebase, your APIs, your domain. And the AI is answering from general knowledge — which means it's getting your specifics wrong a non-trivial percentage of the time.
MCP lets you meet the AI where it already is. Instead of fighting the fact that developers use AI, you make the AI more useful by giving it access to your actual systems.
Observation 2: Context stuffing doesn't scale
The common workaround for AI's lack of context is to paste relevant information into the prompt. "Here's my schema. Here's the error log. Here's the config file." This is context stuffing, and it has hard limits:
- Context window limits. Even with 200K token models, you can't paste your entire codebase.
- Relevance filtering. The human has to decide what's relevant before asking the question, which assumes they already know the answer's shape.
- Staleness. The pasted context is a snapshot. If the schema changed yesterday and you pasted last week's version, the AI's answer is wrong.
MCP replaces context stuffing with context fetching. The AI asks for what it needs, when it needs it, from the live source. No human in the loop. No stale snapshots.
Observation 3: Structured tools beat unstructured context
There's a fundamental difference between giving an AI a blob of text and giving it a typed tool.
Unstructured context: "Here's a JSON file with 3,000 lines of FHIR StructureDefinitions. Somewhere in there is the information about the Patient resource."
Structured tool: "Call fhir.get_definition(version='R4', kind='StructureDefinition', name='Patient') and you'll get exactly the Patient definition with metadata."
The unstructured approach makes the AI do the work of parsing, searching, and disambiguating. The structured approach makes the server do that work — where it can use proper indexing, query optimization, and validation — and gives the AI a clean result.
This is the same lesson the industry learned with databases decades ago. You don't give users a flat file and tell them to grep for what they need. You give them a query interface. MCP is the query interface for AI.
Observation 4: AI clients are converging on MCP
Claude Desktop supports MCP natively. Cursor supports MCP. VS Code is adding MCP support. The ecosystem is converging on this protocol as the standard way for AI assistants to interact with external systems.
This means building an MCP server isn't a bet on one AI provider. It's an investment that works across every MCP-compatible client. Write once, work everywhere — the same server handles Claude, Cursor, and whatever comes next.
Observation 5: The best time to build an MCP server is before you need one
Here's a pattern we see:
- A team starts using AI for development.
- AI gives wrong answers about the team's specific domain.
- The team compensates with manual context stuffing and mental fact-checking.
- Months pass. The workarounds become exhausting.
- Someone says "we should build a tool for this."
The teams that build the MCP server at step 2 save months of accumulated friction. The ones that wait until step 5 have to retrofit it while already being frustrated.
The Motivation Behind my Project
I work in healthcare interoperability. My domain is FHIR — the standard that governs how health data is structured and exchanged between systems. It's a specification that:
- Has hundreds of resource types (Patient, Observation, Condition, MedicationRequest, ...).
- Spans multiple versions (R4, R4B, R5) with subtle but important differences between them.
- Is extended by Implementation Guides (US Core, Da Vinci, mCODE, ...) that add constraints, profiles, and extensions.
- Is deeply structural — a StructureDefinition has elements, types, cardinality constraints, slicing rules, invariants, and bindings to terminology.
This is exactly the kind of domain where AI confidently gives almost-right answers. And in healthcare, almost-right is dangerous. A developer who implements a resource mapping based on a hallucinated field name creates a real interoperability bug — one that might not surface until clinical data flows through the wrong path.
We needed the AI to stop guessing and start looking things up.
But we also wanted something broader than a single-purpose tool. We wanted to validate an approach: can you take a complex, versioned, deeply structured specification and make it available to AI in a way that's fast, local, and useful?
The answer is yes. And the approach generalizes.
MCP Is Not Just for FHIR
Everything we built for FHIR could be applied to any domain with these characteristics:
Complex, versioned specifications
- OpenAPI/Swagger specs: An MCP server that lets AI look up your API endpoints, request/response schemas, and versioning — from the actual spec file, not from memory.
- Database schemas: An MCP server that queries your database metadata (tables, columns, types, relationships, indexes) so the AI can write correct SQL without you pasting the schema.
- Infrastructure-as-Code: An MCP server that reads your Terraform state, CloudFormation templates, or Kubernetes manifests so the AI understands your actual infrastructure, not a generic tutorial version.
Regulatory or compliance frameworks
- HIPAA, SOC2, GDPR: An MCP server that lets AI look up specific regulatory requirements, controls, and your organization's compliance status.
- Clinical terminology: SNOMED CT, LOINC, ICD-10 — enormous code systems that AI can't memorize but could search and retrieve through tools.
Internal knowledge
- Internal documentation: An MCP server that indexes your team's runbooks, architecture decision records, and onboarding guides.
- Configuration management: An MCP server that reads your application's feature flags, environment configs, and deployment status.
The pattern is always the same:
┌───────────────────────────────────────────────────┐
│ Your Domain Knowledge │
│ │
│ Specifications, schemas, configs, terminology, │
│ documentation, compliance requirements, ... │
└───────────────────┬───────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────┐
│ Indexer / Loader │
│ │
│ Extract, normalize, store in a searchable index │
└───────────────────┬───────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────┐
│ MCP Server │
│ │
│ Tools: lookup, search, compare, validate │
│ Resources: addressable items via URIs │
│ Prompts: guidance on how to use the output │
└───────────────────┬───────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────┐
│ AI Client │
│ │
│ Claude Desktop, Cursor, VS Code, custom apps... │
│ Calls tools, reads resources, follows prompts │
└───────────────────────────────────────────────────┘
What Changes When AI Can Look Things Up
When we shipped the first working version of our FHIR MCP server and plugged it into Claude Desktop, something shifted in how we worked.
Before MCP:
- "Claude, what elements are in FHIR R4 Patient?" → Read response, open spec to verify, correct two errors, paste corrections back
- "What's different about Observation between R4 and R5?" → Claude gives a plausible but unverifiable answer. Spend 20 minutes diffing specs manually.
- "Does US Core require Patient.identifier?" → Claude says yes confidently. Is it right? Open the IG, find the profile, check the cardinality. Claude was right this time, but you had to check.
After MCP:
- "Claude, what elements are in FHIR R4 Patient?" → Claude calls
fhir.get_definition, returns the actual definition, summarizes it. No need to verify — it's from the spec. - "What's different about Observation between R4 and R5?" → Claude calls
fhir.diff_versions, gets the actual differences, explains them. - "Does US Core require Patient.identifier?" → Claude calls
uscore.get_profile, reads the constraint, answers with the actual cardinality and must-support flag.
The mental overhead disappeared. Not partially — entirely. We stopped being the middleware between the AI and the specification. The AI handled it.
And here's the subtle thing: we started asking better questions. When you trust that the AI's answers are grounded, you ask more ambitious questions. You ask follow-ups. You explore edge cases. The conversation becomes collaborative instead of adversarial.
The Counterarguments (And Why We Disagree)
"Just use a bigger context window"
Context windows are getting larger, and some people argue that you should just dump everything into the prompt. But this misses several points:
- Bigger context ≠ better retrieval. Studies consistently show that models struggle to find specific information in very long contexts ("lost in the middle" problem). A targeted tool call beats a 200K-token haystack.
- Cost scales with context. Larger prompts cost more per request. A tool call that returns 500 tokens of targeted data is cheaper than pre-loading 50,000 tokens of "just in case" context.
- Latency scales with context. Time-to-first-token increases with prompt length. Small, focused tool calls keep the conversation snappy.
"Just use RAG"
RAG is great for unstructured documents. But when your data is structured — schemas, specifications, typed resources — RAG's embedding-and-chunk approach loses structural relationships. You can't meaningfully embed a 40KB JSON StructureDefinition and expect cosine similarity to find "the cardinality of Patient.identifier.system."
MCP tools can do targeted, structured queries. RAG can't. They're complementary, but for structured domains, MCP is the right tool.
"We'll wait for AI to get better"
AI will get better. Models will memorize more. But the long tail of domain-specific, versioned, organization-specific knowledge will always exceed what's in training data. Your database schema isn't in GPT-5's training set. Your FHIR IG published last month isn't either. MCP bridges this gap regardless of how smart the model gets.
"Building an MCP server is too much work"
Our first working version was ~500 lines of Python across the server, handlers, and transport. The indexer was ~100 lines. We used SQLite (ships with Python), Pydantic (one pip install), and JSON-RPC (a trivial protocol). No infrastructure. No cloud services. No frameworks.
If you can build a CLI tool, you can build an MCP server. The protocol is simpler than REST.
How to Think About Your First MCP Server
If you're considering building an MCP server, here's the decision framework we'd recommend:
Step 1: Identify the "fact-checking tax"
Where does your team spend time verifying AI outputs against ground truth? Every time someone copies a schema into a prompt, checks an API response against documentation, or says "let me verify that" after reading an AI answer — that's the tax. The bigger the tax, the stronger the case for MCP.
Step 2: Identify the data source
What's the ground truth? A specification? A database? An API? A set of configuration files? This is what your MCP server will index or query.
Step 3: Identify the operations
What does the AI need to do with that data? Usually it's some combination of:
- Lookup: Get a specific item by identifier.
- Search: Find items matching a query.
- Compare: Diff two versions or configurations.
- Validate: Check if something conforms to a specification.
- List: Enumerate available items.
Each of these becomes an MCP tool.
Step 4: Start with one tool
Don't build all six tools on day one. Build the lookup tool. Get it working in Claude Desktop or Cursor. Use it for a week. You'll immediately discover what the second tool should be.
Step 5: Iterate based on what the AI gets wrong
Watch how the AI uses your tools. When it calls the wrong tool, that's a signal that your tool names or schemas need clarification. When it sends bad inputs, that's a signal that your input model needs better field names or defaults. When it presents the output poorly, that's a signal that you need a prompt.
MCP servers are living things. They improve through use.
Where This Is Headed
We believe MCP (or something like it) will become standard infrastructure for software teams. Not today, maybe not this year, but soon. The same way that APIs became standard for service-to-service communication, MCP will become standard for AI-to-application communication.
The teams that build MCP servers early will have a head start. They'll have cleaner tool interfaces, better prompt patterns, and more experience with AI-as-caller design. They'll also have developers who trust their AI assistants because those assistants actually give correct, grounded answers.
Our FHIR MCP server was a proof of concept. It works. It's useful. And it proved to us that the pattern generalizes. If your domain has complex, structured, versioned knowledge that AI gets wrong — and what domain doesn't? — building an MCP server is one of the highest-leverage investments you can make.
If you understand MCP deeply, integrating any new data sources/application/AI context becomes significantly easier.
If you would like to connect with me, let’s connect on LinkedIn or drop me a message—I’d love to explore how I can help drive your data/AI success!
This post is a prequel to our three-part implementation series:
Coming Soon.......
Top comments (0)