Siddhartha Mani

Posted on Dec 22

How I Built an MCP Server to Create My Own AI Writing Style Guide Expert

#mcp #documentation #ai #vscode

As a technical writer, my life revolves around corporate Writing Style Guides. The Writing Style Guide is my bible for me, every company I work for, my writing pattern follows their Writing Style Guide. For technical writers, the Writing Style Guide plays the most important role in documenting any Technical Content, APIs or CLIs. But today I want to be honest: the writing style guide in general is always a massive PDF. Writers spend a lot of time figuring out what to use and when to use it. For example: Should I write “can’t” or “cannot”? Do I need a comma before “and”? How should IP addresses be formatted? Is the term “Slave” still approved for documentation?

Constantly searching through a 400+ page PDF breaks my flow. I explain code, I write conceptual docs, I write API docs, I don't want to play "Ctrl+F" detective every 10 minutes.

So, I built a solution.

I created a custom MCP (Model Context Protocol) server that reads my corporate Writing Style Guide and answers my questions instantly—right inside Claude Desktop and VS Code.

Yes, you heard right! Writing Style Guide MCP Server.

Here is how I did it, and how you can too.

What is MCP?

Think of MCP as a universal Universal Serial Bus (USB) port for AI models. Just like we can use USB in our laptop as plug‑and‑play, MCP serves the same role for AI models.
Classically, if you wanted to connect Claude Desktop to your database, you would write a specific integration. If you wanted to connect it to VS Code, you would write another.

MCP changes that. You build a "Server" (like my Writing Style Guide tool) once, and any "Client" (Claude, VS Code, Zed Editor) can plug into it and use it.

The Architecture:

The implementation uses RAG (Retrieval-Augmented Generation), an AI framework that connects large language models (LLMs) to external data sources. RAG retrieves specific information from these sources before generating answers. In this case, the external data source is the writing-style-documentation.pdf file.

The architecture includes four components:

Source file: The writing-style-documentation.pdf file contains the style guide content.
Processor: Python scripts split the PDF into small, readable chunks for efficient processing.
Vector store: FAISS indexes and retrieves relevant content. For example, when you ask about punctuation, the vector store finds the exact page that discusses commas.
Large language model: The Groq API with the Llama 3 model generates responses based on the retrieved content.

Step-by-Step Implementation

Complete the following steps to create an MCP server for the writing style guide.

Prerequisites:

Basic understanding of Python.
Basic knowledge of Agentic AI.

Step 1: Set up the project

1 Create a new MCP-Styleguide folder in your local system for the Python workspace.
2 Open your editor, navigate to the MCP-Styleguide directory in the terminal and run the following command to create a new environment named .venv:

python3 -m venv .venv

3 Create a new requirements.txt file with following content:

mcp
langchain
langchain-community
langchain-groq
sentence-transformers
langchain-huggingface
pypdf
faiss-cpu
python-dotenv
uvicorn

4 Install the packages by running the following command:

pip install -r requirements.txt`

Key libraries:

mcp: Enables communication with Claude and VS Code
langchain: Manages PDF processing and AI logic
langchain-community: Provides community integrations for LangChain
langchain-groq: Integrates Groq API with LangChain
sentence-transformers: Creates text embeddings for semantic search
langchain-huggingface: Integrates Hugging Face models with LangChain
pypdf: Extracts text from PDF files
faiss-cpu: Performs vector similarity search.
python-dotenv: Loads environment variables from .env files
uvicorn: Runs ASGI web applications

5 Create a new .env environment file with your groq API.

GROQ_API_KEY=gsk_OSV6lH3Mkelmsker45dlknWL......

Step 2: Set up the RAG engine

Complete the following steps to create the RAG engine that processes and queries the Writing Style Guide.

1. Create the RAG engine file

Create a new rag.py file in the MCP-Styleguide folder.

2. Import required libraries

Add the following import statements to the file:

import os
from typing import List, Optional
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from dotenv import load_dotenv

Purpose: Imports the necessary libraries for PDF processing, embeddings, vector storage, and language model integration.

Key libraries:

os: Provides file path operations
typing: Supplies type hints for function parameters and return values
PyPDFLoader: Loads and extracts content from PDF files
RecursiveCharacterTextSplitter: Divides documents into smaller chunks
FAISS: Creates and manages the vector database for similarity search
HuggingFaceEmbeddings: Generates text embeddings using HuggingFace models
ChatOpenAI: Provides the interface to language models
RetrievalQA: Creates question-answering chains with retrieval
PromptTemplate: Formats prompts with variables
load_dotenv: Loads environment variables from .env files

3. Load environment variables

Add the following code to load environment variables from the .env file in the MCP-Styleguide directory:

BASE_DIR = os.path.dirname(os.path.abspath(__file__))
load_dotenv(os.path.join(BASE_DIR, ".env"))

Purpose: Configures the base directory and loads environment variables required for API authentication.

Key elements:

os.path.abspath(__file__): Gets the absolute path of the current script file
os.path.dirname(): Extracts the directory path from the absolute file path
BASE_DIR: Stores the directory path where the script is located
load_dotenv(): Reads environment variables from the .env file in the base directory

4. Define global variables and configuration

Add the following code to define the global variables, file path, and prompt template:

qa_chain = None
vector_store = None

PDF_PATH = os.path.join(BASE_DIR, "writing-style-documentation.pdf")

PROMPT_TEMPLATE = """You are an MCP server designed to answer questions about the Writing Style Guide.

You have access to the full Writing Style Guide PDF content. Use only this content to answer. Do not use external knowledge.

**When answering, follow these guidelines:**
- If the question is about a specific rule, term, or example, locate it in the guide.
- If multiple sections are relevant, synthesize the information.
- If the guide does not contain the answer, state that and suggest consulting the full guide.
- Provide examples from the guide when helpful.
- Cite the relevant section or page number when possible.
- Keep answers clear, concise, and professional.

**User Question:**
{question}

**Relevant Context from Writing Style Guide:**
{context}

**Answer:**
"""

Purpose: Defines the configuration elements that control how the RAG engine operates and responds to queries.

Key elements:

qa_chain = None: Initializes the global variable that will store the question-answering chain instance
vector_store = None: Initializes the global variable that will store the vector database instance
PDF_PATH: Constructs the full file path to the Writing Style Guide PDF by combining the base directory with the filename
PROMPT_TEMPLATE: Defines the template that structures responses from the MCP server with the following characteristics:
- Instructs the model to use only content from the Writing Style Guide
- Provides guidelines for answering different types of questions
- Includes placeholders {question} and {context} that will be replaced with actual user queries and retrieved content
- Ensures responses are clear, concise, and professional

Template variables:

{question}: Replaced with the user's question at runtime
{context}: Replaced with relevant content retrieved from the Writing Style Guide

5. Function declaration and PDF validation

Add the initialize_rag() function to set up the RAG pipeline. This function loads the PDF, creates embeddings, and initializes the question-answering chain.

def initialize_rag():
    """Initializes the RAG pipeline: loads PDF, creates embeddings, builds index."""
    global qa_chain, vector_store

    if not os.path.exists(PDF_PATH):
        raise FileNotFoundError(f"PDF file not found at {PDF_PATH}. Please upload 'ibm-style-documentation.pdf'.")

Purpose: Declares the function and validates that the PDF file exists.

Key elements:

global qa_chain, vector_store: Declares global variables to store the retrieval chain and vector store
os.path.exists(PDF_PATH): Checks whether the PDF file exists at the specified path
FileNotFoundError: Raised when the PDF file is not found

6. Load the PDF document

Add the following code to load the PDF document:

    print(f"Loading PDF from {PDF_PATH}...")
    loader = PyPDFLoader(PDF_PATH)
    documents = loader.load()

Purpose: Loads the Writing Style Guide PDF and extracts its content.

Key elements:

PyPDFLoader(PDF_PATH): Creates a PDF loader instance for the specified file
loader.load(): Extracts all pages from the PDF and returns them as document objects

7. Split the document into chunks

Add the following code to split the document into chunks:

    print(f"Splitting {len(documents)} pages...")
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200,
        add_start_index=True,
    )
    splits = text_splitter.split_documents(documents)

Purpose: Divides the PDF content into smaller, manageable chunks for efficient processing.

Parameters:

chunk_size=1000: Sets the maximum size of each text chunk to 1000 characters
chunk_overlap=200: Creates a 200-character overlap between consecutive chunks to preserve context across boundaries
add_start_index=True: Tracks the starting position of each chunk in the original document for reference

Key elements:

RecursiveCharacterTextSplitter: A text splitter that recursively divides documents at natural boundaries
split_documents(documents): Applies the splitting logic to all loaded documents

8. Create embeddings and vector store

Add the following code to generate text embeddings and store them in a vector database:

    print("Creating embeddings and vector store (using HuggingFace embeddings)...")
    embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
    vector_store = FAISS.from_documents(splits, embeddings)

Purpose: Generates text embeddings and stores them in a vector database for similarity search.

Key elements:

HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2"): Uses a lightweight HuggingFace model to convert text into numerical vectors
FAISS.from_documents(splits, embeddings): Creates a FAISS vector store that indexes all document chunks with their embeddings for fast retrieval

9. Configure the language model

Add the following code to configure the Groq for generating responses:

    print("Setting up QA chain with Groq (via ChatOpenAI)...")
    if "GROQ_API_KEY" not in os.environ:
         raise ValueError("GROQ_API_KEY environment variable not found.")

    llm = ChatOpenAI(
        model_name="llama-3.3-70b-versatile", 
        temperature=0,
        api_key=os.environ["GROQ_API_KEY"],
        base_url="https://api.groq.com/openai/v1"
    )

Purpose: Sets up the Groq-hosted language model for generating responses.

Parameters:

model_name="llama-3.3-70b-versatile": Specifies the Llama 3.3 70B model for high-quality responses
temperature=0: Sets deterministic output by eliminating randomness in responses
api_key=os.environ["GROQ_API_KEY"]: Retrieves the API key from environment variables
base_url="https://api.groq.com/openai/v1": Configures the Groq API endpoint

Key elements:

Environment variable validation ensures the API key is set before proceeding
ChatOpenAI: A LangChain wrapper that provides a consistent interface to the language model

10. Initialize the QA chain

Add the following code to initialize the question-answering chain:

    PROMPT = PromptTemplate(
        template=PROMPT_TEMPLATE, input_variables=["context", "question"]
    )

    qa_chain = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",
        retriever=vector_store.as_retriever(search_kwargs={"k": 5}),
        return_source_documents=True,
        chain_type_kwargs={"prompt": PROMPT}
    )
    print("RAG initialization complete.")

Purpose: Creates the question-answering chain that combines retrieval and generation.

Key elements:

PromptTemplate: Formats the prompt using the predefined template with context and question variables
RetrievalQA.from_chain_type(): Creates a QA chain with the following configuration:
- llm=llm: Uses the configured language model
- chain_type="stuff": Uses the "stuff" method, which includes all retrieved documents in a single prompt
- retriever=vector_store.as_retriever(search_kwargs={"k": 5}): Configures the retriever to fetch the top 5 most relevant chunks
- return_source_documents=True: Includes source documents in the response for citation
- chain_type_kwargs={"prompt": PROMPT}: Applies the custom prompt template

11. Function declaration and initialization check

Prerequisites: Ensure that the GROQ_API_KEY environment variable is set in your .env file before running this function.

Add the query_style_guide() function to handle user queries and the test code to verify the implementation.

def query_style_guide(question: str) -> str:
    """Queries the Writing Style Guide using the initialized RAG chain."""
    global qa_chain
    if qa_chain is None:
        try:
            initialize_rag()
        except Exception as e:
            return f"Error initializing RAG engine: {str(e)}"

Purpose: Declares the query function and ensures the RAG chain is initialized before processing questions.

Parameters:

question: str: The user's question about the Writing Style Guide
-> str: Returns the answer as a string

Key elements:

global qa_chain: Accesses the global QA chain variable
if qa_chain is None: Checks whether the RAG chain has been initialized
initialize_rag(): Initializes the RAG pipeline if not already set up
Error handling catches initialization failures and returns a descriptive error message

12. Process the query

    try:
        result = qa_chain.invoke({"query": question})
        return result["result"]
    except Exception as e:
        return f"Error processing query: {str(e)}"

Purpose: Sends the question to the RAG chain and retrieves the answer.

Key elements:

qa_chain.invoke({"query": question}): Passes the user's question to the retrieval and generation pipeline
result["result"]: Extracts the generated answer from the response dictionary
Error handling catches query processing failures and returns a descriptive error message

Return value: The function returns either the generated answer or an error message if the query fails.

13. Test the implementation

if __name__ == "__main__":
    # Test run
    try:
        print(query_style_guide("What are the rules for using contractions?"))
    except Exception as e:
        print(e)

Purpose: Tests the query function when the script runs directly.

Key elements:

if __name__ == "__main__": Ensures the test code runs only when the script executes directly, not when imported as a module
query_style_guide("What are the rules for using contractions?"): Tests the function with a sample question about contraction usage
Error handling catches and displays any exceptions that occur during testing

Expected behavior: When you run the script, it should initialize the RAG pipeline and return the Writing Style Guide's rules for using contractions.

Step 3: The Server

This is the "Face" of the application. using the FastMCP library, it effectively says: "Hey Claude, I have a tool called ask_writing_style_guide. You can send me text, and I'll send you an answer."

from mcp.server.fastmcp import FastMCP
import rag

mcp = FastMCP("Writing Style Guide Expert")

@mcp.tool()
def ask_writing_style_guide(question: str) -> str:
    return rag.query_style_guide(question)

That’s it! Less than 30 lines of code for the server itself.

Step 3: Connecting It to Claude Desktop

In Claude Desktop, go to Settings → Developer → Edit Config
Add your MCP Server entry
Save and Restart Claude Desktop

{
  "mcpServers": {
    "writing-style-guide": {
      "command": "/Users/siddhartha/Documents/thepath/MCP/venv/bin/python",
      "args": [
        "/Users/siddhartha/Documents/thepath/MCP/server.py"
      ]
    }
  }
}

Step 4: Connecting It to My Workflow

This is where the magic happens. I don't use a terminal to query this. I use the tools I am already working in.

Claude Desktop

I added a simple config to my Claude Desktop settings. Now, when I chat with Claude, I can say:

"Can I use word Slave in my documentation?, check the writing style rules and inform."

Claude automatically calls my tool, reads the rule from the PDF: “No, you cannot use "slave" in your documentation according to the Writing Style Guide”

I never had to leave my editor.

Conclusion

Building an MCP server sounds intimidating, but it’s mostly glue code. The ability to give your AI assistants custom knowledge whether it's a style guide, your internal API docs, or your project manifest is a superpower.

Give it a try. Your Ctrl+F keys will thank you :)

DEV Community

How I Built an MCP Server to Create My Own AI Writing Style Guide Expert

What is MCP?

The Architecture:

Step-by-Step Implementation

Step 1: Set up the project

Step 2: Set up the RAG engine

1. Create the RAG engine file

2. Import required libraries

3. Load environment variables

4. Define global variables and configuration

5. Function declaration and PDF validation

6. Load the PDF document

7. Split the document into chunks

8. Create embeddings and vector store

9. Configure the language model

10. Initialize the QA chain

11. Function declaration and initialization check

12. Process the query

13. Test the implementation

Step 3: The Server

Step 3: Connecting It to Claude Desktop

Step 4: Connecting It to My Workflow

Claude Desktop

Conclusion

Top comments (0)