DEV Community

Cover image for An Introduction to Model Context Protocol (MCP)
Mohammed Safvan
Mohammed Safvan

Posted on

An Introduction to Model Context Protocol (MCP)

The Model Context Protocol (MCP) is an open standard enabling structured interaction between LLMs and external tools or data. It introduces a modular architecture comprising hosts, clients, and server, each with well-defined responsibilities, facilitating secure and extensible AI workflows.

This blog shows how to build a minimal MCP server for semantic search over local Markdown notes, focusing on core protocol features and running everything locally.


MCP Architecture Overview

MCP Architecture

  • Host: The primary AI application (e.g., IDEs, assistants) managing LLM execution and client orchestration.
  • Client: An isolated process that connects 1:1 with a server, handles bidirectional messaging, and negotiates capabilities.
  • Server: A lightweight service exposing tools or data through MCP. It remains isolated and cannot access global context or other servers.

MCP uses JSON-RPC for communication and includes a capability negotiation step during initialization.


Server Implementation

To demonstrate MCP in action, a lightweight server was implemented.
The MCP server's tools are defined by adding python decorators @server_name.tool() at the top of tools(function)

1. index_documents(directory_path)

  • Reads all Markdown (.md) files within the specified directory.
  • Chunks text based on structure (e.g., headings).
  • Converts chunks into vector embeddings.
  • Stores embeddings in a Milvus vector database.

2. search(query)

  • Converts the input query into vector form.
  • Queries the Milvus DB for semantically similar text chunks.
  • Returns top-matching segments for later use.

The paraphrase-albert-small-v2 model was used for embeddings. At ~50MB, it supports local execution with acceptable trade-offs for lightweight tasks.


Query Flow

A Sample search query to the server

The protocol-driven flow of a semantic search query in an MCP-compatible setup is as follows:

  1. User Input is submitted through the host application.
  2. The client forwards this input along with a list of available tools to the LLM.
  3. The LLM selects the appropriate tool and specifies parameters.
  4. The client sends a protocol message to the designated server.
  5. The server executes the tool function and returns structured output.
  6. The client forwards retrieved content to the LLM.
  7. The LLM synthesizes a final response using the provided context.

Each layer performs only its designated function, ensuring high modularity and isolation.


Observations

  • Chunking: Heading-based segmentation produced more meaningful retrieval than token-based methods.
  • Performance: Local models require batching to avoid CPU strain during indexing.
  • Protocol Design: MCP’s modular structure and JSON-RPC communication simplify integration and debugging.
  • Interoperability: Capability negotiation ensures only supported features are used, enhancing reliability and extensibility.

You can learn more about MCP in “Hands on Introduction to MCP”
Checkout the Github Repo in MCP-Markdown-RAG


Conclusion

MCP offers a robust foundation for integrating LLMs with local tools via clean, composable interfaces. This experiment confirms its suitability for lightweight semantic search systems and highlights its potential in privacy-conscious, modular AI workflow

Top comments (0)