The Model Context Protocol (MCP) is an open standard enabling structured interaction between LLMs and external tools or data. It introduces a modular architecture comprising hosts, clients, and server, each with well-defined responsibilities, facilitating secure and extensible AI workflows.
This blog shows how to build a minimal MCP server for semantic search over local Markdown notes, focusing on core protocol features and running everything locally.
MCP Architecture Overview
- Host: The primary AI application (e.g., IDEs, assistants) managing LLM execution and client orchestration.
- Client: An isolated process that connects 1:1 with a server, handles bidirectional messaging, and negotiates capabilities.
- Server: A lightweight service exposing tools or data through MCP. It remains isolated and cannot access global context or other servers.
MCP uses JSON-RPC for communication and includes a capability negotiation step during initialization.
Server Implementation
To demonstrate MCP in action, a lightweight server was implemented.
The MCP server's tools are defined by adding python decorators @server_name.tool() at the top of tools(function)
1. index_documents(directory_path)
- Reads all Markdown (
.md) files within the specified directory. - Chunks text based on structure (e.g., headings).
- Converts chunks into vector embeddings.
- Stores embeddings in a Milvus vector database.
2. search(query)
- Converts the input query into vector form.
- Queries the Milvus DB for semantically similar text chunks.
- Returns top-matching segments for later use.
The paraphrase-albert-small-v2 model was used for embeddings. At ~50MB, it supports local execution with acceptable trade-offs for lightweight tasks.
Query Flow
The protocol-driven flow of a semantic search query in an MCP-compatible setup is as follows:
- User Input is submitted through the host application.
- The client forwards this input along with a list of available tools to the LLM.
- The LLM selects the appropriate tool and specifies parameters.
- The client sends a protocol message to the designated server.
- The server executes the tool function and returns structured output.
- The client forwards retrieved content to the LLM.
- The LLM synthesizes a final response using the provided context.
Each layer performs only its designated function, ensuring high modularity and isolation.
Observations
- Chunking: Heading-based segmentation produced more meaningful retrieval than token-based methods.
- Performance: Local models require batching to avoid CPU strain during indexing.
- Protocol Design: MCP’s modular structure and JSON-RPC communication simplify integration and debugging.
- Interoperability: Capability negotiation ensures only supported features are used, enhancing reliability and extensibility.
You can learn more about MCP in “Hands on Introduction to MCP”
Checkout the Github Repo in MCP-Markdown-RAG
Conclusion
MCP offers a robust foundation for integrating LLMs with local tools via clean, composable interfaces. This experiment confirms its suitability for lightweight semantic search systems and highlights its potential in privacy-conscious, modular AI workflow


Top comments (0)