An Introduction to Model Context Protocol (MCP)

The Model Context Protocol (MCP) is an open standard enabling structured interaction between LLMs and external tools or data. It introduces a modular architecture comprising hosts, clients, and server, each with well-defined responsibilities, facilitating secure and extensible AI workflows.

This blog shows how to build a minimal MCP server for semantic search over local Markdown notes, focusing on core protocol features and running everything locally.

MCP Architecture Overview

Host: The primary AI application (e.g., IDEs, assistants) managing LLM execution and client orchestration.
Client: An isolated process that connects 1:1 with a server, handles bidirectional messaging, and negotiates capabilities.
Server: A lightweight service exposing tools or data through MCP. It remains isolated and cannot access global context or other servers.

MCP uses JSON-RPC for communication and includes a capability negotiation step during initialization.

Server Implementation

To demonstrate MCP in action, a lightweight server was implemented.
The MCP server's tools are defined by adding python decorators @server_name.tool() at the top of tools(function)

1. `index_documents(directory_path)`

Reads all Markdown (.md) files within the specified directory.
Chunks text based on structure (e.g., headings).
Converts chunks into vector embeddings.
Stores embeddings in a Milvus vector database.

2. `search(query)`

Converts the input query into vector form.
Queries the Milvus DB for semantically similar text chunks.
Returns top-matching segments for later use.

The paraphrase-albert-small-v2 model was used for embeddings. At ~50MB, it supports local execution with acceptable trade-offs for lightweight tasks.

Query Flow

The protocol-driven flow of a semantic search query in an MCP-compatible setup is as follows:

User Input is submitted through the host application.
The client forwards this input along with a list of available tools to the LLM.
The LLM selects the appropriate tool and specifies parameters.
The client sends a protocol message to the designated server.
The server executes the tool function and returns structured output.
The client forwards retrieved content to the LLM.
The LLM synthesizes a final response using the provided context.

Each layer performs only its designated function, ensuring high modularity and isolation.

Observations

Chunking: Heading-based segmentation produced more meaningful retrieval than token-based methods.
Performance: Local models require batching to avoid CPU strain during indexing.
Protocol Design: MCP’s modular structure and JSON-RPC communication simplify integration and debugging.
Interoperability: Capability negotiation ensures only supported features are used, enhancing reliability and extensibility.

You can learn more about MCP in “Hands on Introduction to MCP”
Checkout the Github Repo in MCP-Markdown-RAG

Conclusion

MCP offers a robust foundation for integrating LLMs with local tools via clean, composable interfaces. This experiment confirms its suitability for lightweight semantic search systems and highlights its potential in privacy-conscious, modular AI workflow

DEV Community

An Introduction to Model Context Protocol (MCP)

MCP Architecture Overview

Server Implementation

1. `index_documents(directory_path)`

2. `search(query)`

Query Flow

Observations

Conclusion

Top comments (0)

MCP Architecture Overview

Server Implementation

1. index_documents(directory_path)

2. search(query)

Query Flow

Observations

Conclusion

1. `index_documents(directory_path)`

2. `search(query)`