As a technical writer, my life revolves around corporate Writing Style Guides. The Writing Style Guide is my bible for me, every company I work for, my writing pattern follows their Writing Style Guide. For technical writers, the Writing Style Guide plays the most important role in documenting any Technical Content, APIs or CLIs. But today I want to be honest: the writing style guide in general is always a massive PDF. Writers spend a lot of time figuring out what to use and when to use it. For example: Should I write “can’t” or “cannot”? Do I need a comma before “and”? How should IP addresses be formatted? Is the term “Slave” still approved for documentation?
Constantly searching through a 400+ page PDF breaks my flow. I explain code, I write conceptual docs, I write API docs, I don't want to play "Ctrl+F" detective every 10 minutes.
So, I built a solution.
I created a custom MCP (Model Context Protocol) server that reads my corporate Writing Style Guide and answers my questions instantly—right inside Claude Desktop and VS Code.
Yes, you heard right! Writing Style Guide MCP Server.
Here is how I did it, and how you can too.
What is MCP?
Think of MCP as a universal Universal Serial Bus (USB) port for AI models. Just like we can use USB in our laptop as plug‑and‑play, MCP serves the same role for AI models.
Classically, if you wanted to connect Claude Desktop to your database, you would write a specific integration. If you wanted to connect it to VS Code, you would write another.
MCP changes that. You build a "Server" (like my Writing Style Guide tool) once, and any "Client" (Claude, VS Code, Zed Editor) can plug into it and use it.
The Architecture:
The implementation uses RAG (Retrieval-Augmented Generation), an AI framework that connects large language models (LLMs) to external data sources. RAG retrieves specific information from these sources before generating answers. In this case, the external data source is the writing-style-documentation.pdf file.
The architecture includes four components:
-
Source file: Thewriting-style-documentation.pdffile contains the style guide content. -
Processor: Python scripts split the PDF into small, readable chunks for efficient processing. -
Vector store: FAISS indexes and retrieves relevant content. For example, when you ask about punctuation, the vector store finds the exact page that discusses commas. -
Large language model: The Groq API with the Llama 3 model generates responses based on the retrieved content.
Step-by-Step Implementation
Complete the following steps to create an MCP server for the writing style guide.
Prerequisites:
- Basic understanding of Python.
- Basic knowledge of Agentic AI.
Step 1: Set up the project
1 Create a new MCP-Styleguide folder in your local system for the Python workspace.
2 Open your editor, navigate to the MCP-Styleguide directory in the terminal and run the following command to create a new environment named .venv:
python3 -m venv .venv
3 Create a new requirements.txt file with following content:
mcp
langchain
langchain-community
langchain-groq
sentence-transformers
langchain-huggingface
pypdf
faiss-cpu
python-dotenv
uvicorn
4 Install the packages by running the following command:
pip install -r requirements.txt`
Key libraries:
-
mcp: Enables communication with Claude and VS Code -
langchain: Manages PDF processing and AI logic -
langchain-community: Provides community integrations for LangChain -
langchain-groq: Integrates Groq API with LangChain -
sentence-transformers: Creates text embeddings for semantic search -
langchain-huggingface: Integrates Hugging Face models with LangChain -
pypdf: Extracts text from PDF files -
faiss-cpu: Performs vector similarity search. -
python-dotenv: Loads environment variables from.envfiles -
uvicorn: Runs ASGI web applications
5 Create a new .env environment file with your groq API.
GROQ_API_KEY=gsk_OSV6lH3Mkelmsker45dlknWL......
Step 2: Set up the RAG engine
Complete the following steps to create the RAG engine that processes and queries the Writing Style Guide.
1. Create the RAG engine file
Create a new rag.py file in the MCP-Styleguide folder.
2. Import required libraries
Add the following import statements to the file:
import os
from typing import List, Optional
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from dotenv import load_dotenv
Purpose: Imports the necessary libraries for PDF processing, embeddings, vector storage, and language model integration.
Key libraries:
-
os: Provides file path operations -
typing: Supplies type hints for function parameters and return values -
PyPDFLoader: Loads and extracts content from PDF files -
RecursiveCharacterTextSplitter: Divides documents into smaller chunks -
FAISS: Creates and manages the vector database for similarity search -
HuggingFaceEmbeddings: Generates text embeddings using HuggingFace models -
ChatOpenAI: Provides the interface to language models -
RetrievalQA: Creates question-answering chains with retrieval -
PromptTemplate: Formats prompts with variables -
load_dotenv: Loads environment variables from.envfiles
3. Load environment variables
Add the following code to load environment variables from the .env file in the MCP-Styleguide directory:
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
load_dotenv(os.path.join(BASE_DIR, ".env"))
Purpose: Configures the base directory and loads environment variables required for API authentication.
Key elements:
-
os.path.abspath(__file__): Gets the absolute path of the current script file -
os.path.dirname(): Extracts the directory path from the absolute file path -
BASE_DIR: Stores the directory path where the script is located -
load_dotenv(): Reads environment variables from the.envfile in the base directory
4. Define global variables and configuration
Add the following code to define the global variables, file path, and prompt template:
qa_chain = None
vector_store = None
PDF_PATH = os.path.join(BASE_DIR, "writing-style-documentation.pdf")
PROMPT_TEMPLATE = """You are an MCP server designed to answer questions about the Writing Style Guide.
You have access to the full Writing Style Guide PDF content. Use only this content to answer. Do not use external knowledge.
**When answering, follow these guidelines:**
- If the question is about a specific rule, term, or example, locate it in the guide.
- If multiple sections are relevant, synthesize the information.
- If the guide does not contain the answer, state that and suggest consulting the full guide.
- Provide examples from the guide when helpful.
- Cite the relevant section or page number when possible.
- Keep answers clear, concise, and professional.
**User Question:**
{question}
**Relevant Context from Writing Style Guide:**
{context}
**Answer:**
"""
Purpose: Defines the configuration elements that control how the RAG engine operates and responds to queries.
Key elements:
-
qa_chain = None: Initializes the global variable that will store the question-answering chain instance -
vector_store = None: Initializes the global variable that will store the vector database instance -
PDF_PATH: Constructs the full file path to the Writing Style Guide PDF by combining the base directory with the filename -
PROMPT_TEMPLATE: Defines the template that structures responses from the MCP server with the following characteristics:- Instructs the model to use only content from the Writing Style Guide
- Provides guidelines for answering different types of questions
- Includes placeholders
{question}and{context}that will be replaced with actual user queries and retrieved content - Ensures responses are clear, concise, and professional
Template variables:
-
{question}: Replaced with the user's question at runtime -
{context}: Replaced with relevant content retrieved from the Writing Style Guide
5. Function declaration and PDF validation
Add the initialize_rag() function to set up the RAG pipeline. This function loads the PDF, creates embeddings, and initializes the question-answering chain.
def initialize_rag():
"""Initializes the RAG pipeline: loads PDF, creates embeddings, builds index."""
global qa_chain, vector_store
if not os.path.exists(PDF_PATH):
raise FileNotFoundError(f"PDF file not found at {PDF_PATH}. Please upload 'ibm-style-documentation.pdf'.")
Purpose: Declares the function and validates that the PDF file exists.
Key elements:
-
global qa_chain, vector_store: Declares global variables to store the retrieval chain and vector store -
os.path.exists(PDF_PATH): Checks whether the PDF file exists at the specified path -
FileNotFoundError: Raised when the PDF file is not found
6. Load the PDF document
Add the following code to load the PDF document:
print(f"Loading PDF from {PDF_PATH}...")
loader = PyPDFLoader(PDF_PATH)
documents = loader.load()
Purpose: Loads the Writing Style Guide PDF and extracts its content.
Key elements:
-
PyPDFLoader(PDF_PATH): Creates a PDF loader instance for the specified file -
loader.load(): Extracts all pages from the PDF and returns them as document objects
7. Split the document into chunks
Add the following code to split the document into chunks:
print(f"Splitting {len(documents)} pages...")
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
add_start_index=True,
)
splits = text_splitter.split_documents(documents)
Purpose: Divides the PDF content into smaller, manageable chunks for efficient processing.
Parameters:
-
chunk_size=1000: Sets the maximum size of each text chunk to 1000 characters -
chunk_overlap=200: Creates a 200-character overlap between consecutive chunks to preserve context across boundaries -
add_start_index=True: Tracks the starting position of each chunk in the original document for reference
Key elements:
-
RecursiveCharacterTextSplitter: A text splitter that recursively divides documents at natural boundaries -
split_documents(documents): Applies the splitting logic to all loaded documents
8. Create embeddings and vector store
Add the following code to generate text embeddings and store them in a vector database:
print("Creating embeddings and vector store (using HuggingFace embeddings)...")
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vector_store = FAISS.from_documents(splits, embeddings)
Purpose: Generates text embeddings and stores them in a vector database for similarity search.
Key elements:
-
HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2"): Uses a lightweight HuggingFace model to convert text into numerical vectors -
FAISS.from_documents(splits, embeddings): Creates a FAISS vector store that indexes all document chunks with their embeddings for fast retrieval
9. Configure the language model
Add the following code to configure the Groq for generating responses:
print("Setting up QA chain with Groq (via ChatOpenAI)...")
if "GROQ_API_KEY" not in os.environ:
raise ValueError("GROQ_API_KEY environment variable not found.")
llm = ChatOpenAI(
model_name="llama-3.3-70b-versatile",
temperature=0,
api_key=os.environ["GROQ_API_KEY"],
base_url="https://api.groq.com/openai/v1"
)
Purpose: Sets up the Groq-hosted language model for generating responses.
Parameters:
-
model_name="llama-3.3-70b-versatile": Specifies the Llama 3.3 70B model for high-quality responses -
temperature=0: Sets deterministic output by eliminating randomness in responses -
api_key=os.environ["GROQ_API_KEY"]: Retrieves the API key from environment variables -
base_url="https://api.groq.com/openai/v1": Configures the Groq API endpoint
Key elements:
- Environment variable validation ensures the API key is set before proceeding
-
ChatOpenAI: A LangChain wrapper that provides a consistent interface to the language model
10. Initialize the QA chain
Add the following code to initialize the question-answering chain:
PROMPT = PromptTemplate(
template=PROMPT_TEMPLATE, input_variables=["context", "question"]
)
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=vector_store.as_retriever(search_kwargs={"k": 5}),
return_source_documents=True,
chain_type_kwargs={"prompt": PROMPT}
)
print("RAG initialization complete.")
Purpose: Creates the question-answering chain that combines retrieval and generation.
Key elements:
-
PromptTemplate: Formats the prompt using the predefined template with context and question variables -
RetrievalQA.from_chain_type(): Creates a QA chain with the following configuration:-
llm=llm: Uses the configured language model -
chain_type="stuff": Uses the "stuff" method, which includes all retrieved documents in a single prompt -
retriever=vector_store.as_retriever(search_kwargs={"k": 5}): Configures the retriever to fetch the top 5 most relevant chunks -
return_source_documents=True: Includes source documents in the response for citation -
chain_type_kwargs={"prompt": PROMPT}: Applies the custom prompt template
-
11. Function declaration and initialization check
Prerequisites: Ensure that the GROQ_API_KEY environment variable is set in your .env file before running this function.
Add the query_style_guide() function to handle user queries and the test code to verify the implementation.
def query_style_guide(question: str) -> str:
"""Queries the Writing Style Guide using the initialized RAG chain."""
global qa_chain
if qa_chain is None:
try:
initialize_rag()
except Exception as e:
return f"Error initializing RAG engine: {str(e)}"
Purpose: Declares the query function and ensures the RAG chain is initialized before processing questions.
Parameters:
-
question: str: The user's question about the Writing Style Guide -
-> str: Returns the answer as a string
Key elements:
-
global qa_chain: Accesses the global QA chain variable -
if qa_chain is None: Checks whether the RAG chain has been initialized -
initialize_rag(): Initializes the RAG pipeline if not already set up - Error handling catches initialization failures and returns a descriptive error message
12. Process the query
try:
result = qa_chain.invoke({"query": question})
return result["result"]
except Exception as e:
return f"Error processing query: {str(e)}"
Purpose: Sends the question to the RAG chain and retrieves the answer.
Key elements:
-
qa_chain.invoke({"query": question}): Passes the user's question to the retrieval and generation pipeline -
result["result"]: Extracts the generated answer from the response dictionary - Error handling catches query processing failures and returns a descriptive error message
Return value: The function returns either the generated answer or an error message if the query fails.
13. Test the implementation
if __name__ == "__main__":
# Test run
try:
print(query_style_guide("What are the rules for using contractions?"))
except Exception as e:
print(e)
Purpose: Tests the query function when the script runs directly.
Key elements:
-
if __name__ == "__main__": Ensures the test code runs only when the script executes directly, not when imported as a module -
query_style_guide("What are the rules for using contractions?"): Tests the function with a sample question about contraction usage - Error handling catches and displays any exceptions that occur during testing
Expected behavior: When you run the script, it should initialize the RAG pipeline and return the Writing Style Guide's rules for using contractions.
Step 3: The Server
This is the "Face" of the application. using the FastMCP library, it effectively says: "Hey Claude, I have a tool called ask_writing_style_guide. You can send me text, and I'll send you an answer."
from mcp.server.fastmcp import FastMCP
import rag
mcp = FastMCP("Writing Style Guide Expert")
@mcp.tool()
def ask_writing_style_guide(question: str) -> str:
return rag.query_style_guide(question)
That’s it! Less than 30 lines of code for the server itself.
Step 3: Connecting It to Claude Desktop
- In Claude Desktop, go to Settings → Developer → Edit Config
- Add your MCP Server entry
- Save and Restart Claude Desktop
{
"mcpServers": {
"writing-style-guide": {
"command": "/Users/siddhartha/Documents/thepath/MCP/venv/bin/python",
"args": [
"/Users/siddhartha/Documents/thepath/MCP/server.py"
]
}
}
}

Step 4: Connecting It to My Workflow
This is where the magic happens. I don't use a terminal to query this. I use the tools I am already working in.
Claude Desktop
I added a simple config to my Claude Desktop settings. Now, when I chat with Claude, I can say:
"Can I use word Slave in my documentation?, check the writing style rules and inform."
Claude automatically calls my tool, reads the rule from the PDF: “No, you cannot use "slave" in your documentation according to the Writing Style Guide”
I never had to leave my editor.
Conclusion
Building an MCP server sounds intimidating, but it’s mostly glue code. The ability to give your AI assistants custom knowledge whether it's a style guide, your internal API docs, or your project manifest is a superpower.
Give it a try. Your Ctrl+F keys will thank you :)
Top comments (0)