DEV Community: Siddhartha Mani

If you are Information Developer or Technical Writer you can relate with this!! #ai #documentation

Siddhartha Mani — Thu, 19 Feb 2026 13:58:58 +0000

Writing for humans is no longer enough. Writing for AI is now part of the job

Siddhartha Mani ・ Feb 12

Writing for humans is no longer enough. Writing for AI is now part of the job

Siddhartha Mani — Thu, 12 Feb 2026 16:33:58 +0000

When my company integrated an AI assistant into our documentation platform, I was part of one of the early teams selected to test it. The goal was to see how well the AI could answer real user questions by using our existing documentation as the source of truth.
What I did not expect was how much this exercise would change the way I think about technical writing.
This was not just about AI accuracy. It was about how documentation itself behaves when consumed by large language models (LLMs).

This article shares what that testing exposed and what it means for technical writers today.

Testing approach: real questions, real results

Testing started with questions that real Google Cloud users were already asking. Real support tickets, common search queries, and production environment questions were pulled directly using the QnA Analyzer tool.

For each question, the team compared three outputs:

• The AI assistant answer.

• The documentation pages it retrieved.

• The response that a subject matter expert or support engineer would give.

The team tracked several quality signals, including answer correctness, follow-up prompts required, product information mixing, and missing prerequisites.

When the AI assistant produced a weak or confusing answer, the first question was always whether the documentation was the cause, not the model. This framing changed the entire analysis.

The first surprise: the same question, different answers

During testing, the assistant sometimes gave different answers to the same question. This happened even when nothing changed: the question was the same, the documentation was the same, and the same AI model was running. At first, this looked like a reliability problem.

On closer inspection, the behavior revealed something fundamental about how Large Language Models (LLMs) work. LLMs are probabilistic, not deterministic. Unlike a calculator that always returns the same result, an LLM generates answers based on token probability and semantic similarity search, meaning it weighs available content and produces the most likely answer each time. Asking the same question twice is not guaranteed to produce identical output.

The key factor is the quality of the content the AI model retrieves. When the content is clear and specific, the model has less room to vary and answers stay consistent. When the content is vague or incomplete, the model fills in the gaps, and those gaps introduce the variation that looks like unreliability.

Minimalist writing works for humans but not always for AI

Minimalist documentation (short pages, fewer words, fewer explanations) works well for experienced human readers who fill in context from prior knowledge. LLMs cannot do this. They need intent and scope to be written down.

A minimalist page that opens directly with a task list gives the model almost nothing to use when deciding whether that page is relevant to a given query. Adding a short description of one to three sentences, covering the product, the scenario, and the user goal, gives the model a reliable signal to retrieve the right page. Without that context, the model guesses and produces unreliable responses.

The principle is straightforward: when documentation leaves less to interpretation, the model varies less in its answers.

Multiple cross-references can hurt AI retrieval

Too many cross-references created a different kind of documentation problem. When pages are heavily linked to other topics, the AI assistant did not always stay focused on the main page. Instead, it sometimes pulled steps from linked pages, combined partial instructions from multiple locations, and lost the workflow context that the user actually needed. The result was answers that were fragmented and hard to follow.

To address this issue, the approach was adjusted to treat cross-references as a last resort rather than a default. Primary workflows were kept self-contained so the model could find everything it needed on a single page. Circular linking patterns, where page A links to page B which links back to page A, were removed because they created retrieval loops with no clear resolution.

A real example: "How do I back up my instance?"

Users rarely specify which Google Cloud product or service they are using when they ask a question. The documentation needs to carry that context so the AI assistant can match the question to the right product. The question "How do I back up my instance?" is a good example of why this matters.

Multiple Google Cloud services use the word "instance" to mean different things. Several of these services use the concept of an “instance” but refer to different resources. Each service had its own backup process, its own console, and its own steps. However, the documentation pages for each service opened directly with steps and UI labels, with no introduction stating which service the page belonged to.

When the AI assistant received the question from users, it retrieved pages from multiple services and combined their steps into a single answer. The response looked complete and confident. It was not. A user following those steps would be mixing actions from different products and different consoles, which would either fail or cause unintended changes.

The root cause was not the model hallucination. The documentation pages looked identical to the model because they all used the same term and none of them declared their scope upfront.

The fix was straightforward. Each page introduction was updated to open with a clear statement, for example: "This topic describes how to back up an instance in Google Cloud Databases." That one sentence gave the model the signal it needed to retrieve the correct page for the correct service. After the update, the assistant stopped blending workflows and returned accurate, product-specific answers.

The experiment results

The team ran a targeted experiment on a selected set of pages. The changes included adding concise descriptions at the beginning of each page, introducing short summaries for topics and subtopics, enhancing step-level descriptions for clarity, and reducing unnecessary cross references to keep key workflows self-contained.

The results were measurable. AI answers became more accurate, search relevance improved, mixed-product responses decreased, and trust in AI-generated answers increased.

Practical guidance

Apply these practices to make documentation more effective for AI retrieval:

Add a concise description (one to three sentences) to every topic that states the product, scope, and scenario.
State the product scope early in the page, particularly when similar products share terminology.
Write preconditions and constraints explicitly rather than assuming the reader knows the context.
Disambiguate common terms that appear across products by adding a brief clarifying phrase.
Keep key workflows self-contained and limit cross-references to cases where they are necessary.
Treat topic introductions as retrieval anchors that help the model decide when a page is relevant.

Final thought

LLMs don't replace technical writers. They amplify the quality of the documentation we create. When documentation is clear, scoped, and intentional, AI becomes a powerful assistant. When it isn't, AI simply reflects the confusion already present in the content.

MCP server for Style Guide

Siddhartha Mani — Mon, 22 Dec 2025 18:38:31 +0000

Siddhartha Mani

Dec 22 '25

How I Built an MCP Server to Create My Own AI Writing Style Guide Expert

#mcp #documentation #ai #vscode

10 min read

How I Built an MCP Server to Create My Own AI Writing Style Guide Expert

Siddhartha Mani — Mon, 22 Dec 2025 18:10:14 +0000

As a technical writer, my life revolves around corporate Writing Style Guides. The Writing Style Guide is my bible for me, every company I work for, my writing pattern follows their Writing Style Guide. For technical writers, the Writing Style Guide plays the most important role in documenting any Technical Content, APIs or CLIs. But today I want to be honest: the writing style guide in general is always a massive PDF. Writers spend a lot of time figuring out what to use and when to use it. For example: Should I write “can’t” or “cannot”? Do I need a comma before “and”? How should IP addresses be formatted? Is the term “Slave” still approved for documentation?

Constantly searching through a 400+ page PDF breaks my flow. I explain code, I write conceptual docs, I write API docs, I don't want to play "Ctrl+F" detective every 10 minutes.

So, I built a solution.

I created a custom MCP (Model Context Protocol) server that reads my corporate Writing Style Guide and answers my questions instantly—right inside Claude Desktop and VS Code.

Yes, you heard right! Writing Style Guide MCP Server.

Here is how I did it, and how you can too.

What is MCP?

Think of MCP as a universal Universal Serial Bus (USB) port for AI models. Just like we can use USB in our laptop as plug‑and‑play, MCP serves the same role for AI models.
Classically, if you wanted to connect Claude Desktop to your database, you would write a specific integration. If you wanted to connect it to VS Code, you would write another.

MCP changes that. You build a "Server" (like my Writing Style Guide tool) once, and any "Client" (Claude, VS Code, Zed Editor) can plug into it and use it.

The Architecture:

The implementation uses RAG (Retrieval-Augmented Generation), an AI framework that connects large language models (LLMs) to external data sources. RAG retrieves specific information from these sources before generating answers. In this case, the external data source is the writing-style-documentation.pdf file.

The architecture includes four components:

Source file: The writing-style-documentation.pdf file contains the style guide content.
Processor: Python scripts split the PDF into small, readable chunks for efficient processing.
Vector store: FAISS indexes and retrieves relevant content. For example, when you ask about punctuation, the vector store finds the exact page that discusses commas.
Large language model: The Groq API with the Llama 3 model generates responses based on the retrieved content.

Step-by-Step Implementation

Complete the following steps to create an MCP server for the writing style guide.

Prerequisites:

Basic understanding of Python.
Basic knowledge of Agentic AI.

Step 1: Set up the project

1 Create a new MCP-Styleguide folder in your local system for the Python workspace.
2 Open your editor, navigate to the MCP-Styleguide directory in the terminal and run the following command to create a new environment named .venv:

python3 -m venv .venv

3 Create a new requirements.txt file with following content:

mcp
langchain
langchain-community
langchain-groq
sentence-transformers
langchain-huggingface
pypdf
faiss-cpu
python-dotenv
uvicorn

4 Install the packages by running the following command:

pip install -r requirements.txt`

Key libraries:

mcp: Enables communication with Claude and VS Code
langchain: Manages PDF processing and AI logic
langchain-community: Provides community integrations for LangChain
langchain-groq: Integrates Groq API with LangChain
sentence-transformers: Creates text embeddings for semantic search
langchain-huggingface: Integrates Hugging Face models with LangChain
pypdf: Extracts text from PDF files
faiss-cpu: Performs vector similarity search.
python-dotenv: Loads environment variables from .env files
uvicorn: Runs ASGI web applications

5 Create a new .env environment file with your groq API.

GROQ_API_KEY=gsk_OSV6lH3Mkelmsker45dlknWL......

Step 2: Set up the RAG engine

Complete the following steps to create the RAG engine that processes and queries the Writing Style Guide.

1. Create the RAG engine file

Create a new rag.py file in the MCP-Styleguide folder.

2. Import required libraries

Add the following import statements to the file:

import os
from typing import List, Optional
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from dotenv import load_dotenv

Purpose: Imports the necessary libraries for PDF processing, embeddings, vector storage, and language model integration.

Key libraries:

os: Provides file path operations
typing: Supplies type hints for function parameters and return values
PyPDFLoader: Loads and extracts content from PDF files
RecursiveCharacterTextSplitter: Divides documents into smaller chunks
FAISS: Creates and manages the vector database for similarity search
HuggingFaceEmbeddings: Generates text embeddings using HuggingFace models
ChatOpenAI: Provides the interface to language models
RetrievalQA: Creates question-answering chains with retrieval
PromptTemplate: Formats prompts with variables
load_dotenv: Loads environment variables from .env files

3. Load environment variables

Add the following code to load environment variables from the .env file in the MCP-Styleguide directory:

BASE_DIR = os.path.dirname(os.path.abspath(__file__))
load_dotenv(os.path.join(BASE_DIR, ".env"))

Purpose: Configures the base directory and loads environment variables required for API authentication.

Key elements:

os.path.abspath(__file__): Gets the absolute path of the current script file
os.path.dirname(): Extracts the directory path from the absolute file path
BASE_DIR: Stores the directory path where the script is located
load_dotenv(): Reads environment variables from the .env file in the base directory

4. Define global variables and configuration

Add the following code to define the global variables, file path, and prompt template:

qa_chain = None
vector_store = None

PDF_PATH = os.path.join(BASE_DIR, "writing-style-documentation.pdf")

PROMPT_TEMPLATE = """You are an MCP server designed to answer questions about the Writing Style Guide.

You have access to the full Writing Style Guide PDF content. Use only this content to answer. Do not use external knowledge.

**When answering, follow these guidelines:**
- If the question is about a specific rule, term, or example, locate it in the guide.
- If multiple sections are relevant, synthesize the information.
- If the guide does not contain the answer, state that and suggest consulting the full guide.
- Provide examples from the guide when helpful.
- Cite the relevant section or page number when possible.
- Keep answers clear, concise, and professional.

**User Question:**
{question}

**Relevant Context from Writing Style Guide:**
{context}

**Answer:**
"""

Purpose: Defines the configuration elements that control how the RAG engine operates and responds to queries.

Key elements:

qa_chain = None: Initializes the global variable that will store the question-answering chain instance
vector_store = None: Initializes the global variable that will store the vector database instance
PDF_PATH: Constructs the full file path to the Writing Style Guide PDF by combining the base directory with the filename
PROMPT_TEMPLATE: Defines the template that structures responses from the MCP server with the following characteristics:
- Instructs the model to use only content from the Writing Style Guide
- Provides guidelines for answering different types of questions
- Includes placeholders {question} and {context} that will be replaced with actual user queries and retrieved content
- Ensures responses are clear, concise, and professional

Template variables:

{question}: Replaced with the user's question at runtime
{context}: Replaced with relevant content retrieved from the Writing Style Guide

5. Function declaration and PDF validation

Add the initialize_rag() function to set up the RAG pipeline. This function loads the PDF, creates embeddings, and initializes the question-answering chain.

def initialize_rag():
    """Initializes the RAG pipeline: loads PDF, creates embeddings, builds index."""
    global qa_chain, vector_store

    if not os.path.exists(PDF_PATH):
        raise FileNotFoundError(f"PDF file not found at {PDF_PATH}. Please upload 'ibm-style-documentation.pdf'.")

Purpose: Declares the function and validates that the PDF file exists.

Key elements:

global qa_chain, vector_store: Declares global variables to store the retrieval chain and vector store
os.path.exists(PDF_PATH): Checks whether the PDF file exists at the specified path
FileNotFoundError: Raised when the PDF file is not found

6. Load the PDF document

Add the following code to load the PDF document:

    print(f"Loading PDF from {PDF_PATH}...")
    loader = PyPDFLoader(PDF_PATH)
    documents = loader.load()

Purpose: Loads the Writing Style Guide PDF and extracts its content.

Key elements:

PyPDFLoader(PDF_PATH): Creates a PDF loader instance for the specified file
loader.load(): Extracts all pages from the PDF and returns them as document objects

7. Split the document into chunks

Add the following code to split the document into chunks:

    print(f"Splitting {len(documents)} pages...")
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200,
        add_start_index=True,
    )
    splits = text_splitter.split_documents(documents)

Purpose: Divides the PDF content into smaller, manageable chunks for efficient processing.

Parameters:

chunk_size=1000: Sets the maximum size of each text chunk to 1000 characters
chunk_overlap=200: Creates a 200-character overlap between consecutive chunks to preserve context across boundaries
add_start_index=True: Tracks the starting position of each chunk in the original document for reference

Key elements:

RecursiveCharacterTextSplitter: A text splitter that recursively divides documents at natural boundaries
split_documents(documents): Applies the splitting logic to all loaded documents

8. Create embeddings and vector store

Add the following code to generate text embeddings and store them in a vector database:

    print("Creating embeddings and vector store (using HuggingFace embeddings)...")
    embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
    vector_store = FAISS.from_documents(splits, embeddings)

Purpose: Generates text embeddings and stores them in a vector database for similarity search.

Key elements:

HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2"): Uses a lightweight HuggingFace model to convert text into numerical vectors
FAISS.from_documents(splits, embeddings): Creates a FAISS vector store that indexes all document chunks with their embeddings for fast retrieval

9. Configure the language model

Add the following code to configure the Groq for generating responses:

    print("Setting up QA chain with Groq (via ChatOpenAI)...")
    if "GROQ_API_KEY" not in os.environ:
         raise ValueError("GROQ_API_KEY environment variable not found.")

    llm = ChatOpenAI(
        model_name="llama-3.3-70b-versatile", 
        temperature=0,
        api_key=os.environ["GROQ_API_KEY"],
        base_url="https://api.groq.com/openai/v1"
    )

Purpose: Sets up the Groq-hosted language model for generating responses.

Parameters:

model_name="llama-3.3-70b-versatile": Specifies the Llama 3.3 70B model for high-quality responses
temperature=0: Sets deterministic output by eliminating randomness in responses
api_key=os.environ["GROQ_API_KEY"]: Retrieves the API key from environment variables
base_url="https://api.groq.com/openai/v1": Configures the Groq API endpoint

Key elements:

Environment variable validation ensures the API key is set before proceeding
ChatOpenAI: A LangChain wrapper that provides a consistent interface to the language model

10. Initialize the QA chain

Add the following code to initialize the question-answering chain:

    PROMPT = PromptTemplate(
        template=PROMPT_TEMPLATE, input_variables=["context", "question"]
    )

    qa_chain = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",
        retriever=vector_store.as_retriever(search_kwargs={"k": 5}),
        return_source_documents=True,
        chain_type_kwargs={"prompt": PROMPT}
    )
    print("RAG initialization complete.")

Purpose: Creates the question-answering chain that combines retrieval and generation.

Key elements:

PromptTemplate: Formats the prompt using the predefined template with context and question variables
RetrievalQA.from_chain_type(): Creates a QA chain with the following configuration:
- llm=llm: Uses the configured language model
- chain_type="stuff": Uses the "stuff" method, which includes all retrieved documents in a single prompt
- retriever=vector_store.as_retriever(search_kwargs={"k": 5}): Configures the retriever to fetch the top 5 most relevant chunks
- return_source_documents=True: Includes source documents in the response for citation
- chain_type_kwargs={"prompt": PROMPT}: Applies the custom prompt template

11. Function declaration and initialization check

Prerequisites: Ensure that the GROQ_API_KEY environment variable is set in your .env file before running this function.

Add the query_style_guide() function to handle user queries and the test code to verify the implementation.

def query_style_guide(question: str) -> str:
    """Queries the Writing Style Guide using the initialized RAG chain."""
    global qa_chain
    if qa_chain is None:
        try:
            initialize_rag()
        except Exception as e:
            return f"Error initializing RAG engine: {str(e)}"

Purpose: Declares the query function and ensures the RAG chain is initialized before processing questions.

Parameters:

question: str: The user's question about the Writing Style Guide
-> str: Returns the answer as a string

Key elements:

global qa_chain: Accesses the global QA chain variable
if qa_chain is None: Checks whether the RAG chain has been initialized
initialize_rag(): Initializes the RAG pipeline if not already set up
Error handling catches initialization failures and returns a descriptive error message

12. Process the query

    try:
        result = qa_chain.invoke({"query": question})
        return result["result"]
    except Exception as e:
        return f"Error processing query: {str(e)}"

Purpose: Sends the question to the RAG chain and retrieves the answer.

Key elements:

qa_chain.invoke({"query": question}): Passes the user's question to the retrieval and generation pipeline
result["result"]: Extracts the generated answer from the response dictionary
Error handling catches query processing failures and returns a descriptive error message

Return value: The function returns either the generated answer or an error message if the query fails.

13. Test the implementation

if __name__ == "__main__":
    # Test run
    try:
        print(query_style_guide("What are the rules for using contractions?"))
    except Exception as e:
        print(e)

Purpose: Tests the query function when the script runs directly.

Key elements:

if __name__ == "__main__": Ensures the test code runs only when the script executes directly, not when imported as a module
query_style_guide("What are the rules for using contractions?"): Tests the function with a sample question about contraction usage
Error handling catches and displays any exceptions that occur during testing

Expected behavior: When you run the script, it should initialize the RAG pipeline and return the Writing Style Guide's rules for using contractions.

Step 3: The Server

This is the "Face" of the application. using the FastMCP library, it effectively says: "Hey Claude, I have a tool called ask_writing_style_guide. You can send me text, and I'll send you an answer."

from mcp.server.fastmcp import FastMCP
import rag

mcp = FastMCP("Writing Style Guide Expert")

@mcp.tool()
def ask_writing_style_guide(question: str) -> str:
    return rag.query_style_guide(question)

That’s it! Less than 30 lines of code for the server itself.

Step 4: Connecting It to Claude Desktop

In Claude Desktop, go to Settings → Developer → Edit Config
Add your MCP Server entry
Save and Restart Claude Desktop

{
  "mcpServers": {
    "writing-style-guide": {
      "command": "/Users/siddhartha/Documents/thepath/MCP/venv/bin/python",
      "args": [
        "/Users/siddhartha/Documents/thepath/MCP/server.py"
      ]
    }
  }
}

Step 5: Connecting It to My Workflow

This is where the magic happens. I don't use a terminal to query this. I use the tools I am already working in.

Claude Desktop

I added a simple config to my Claude Desktop settings. Now, when I chat with Claude, I can say:

"Can I use word Slave in my documentation?, check the writing style rules and inform."

Claude automatically calls my tool, reads the rule from the PDF: “No, you cannot use "slave" in your documentation according to the Writing Style Guide”

I never had to leave my editor.

Conclusion

Building an MCP server sounds intimidating, but it’s mostly glue code. The ability to give your AI assistants custom knowledge whether it's a style guide, your internal API docs, or your project manifest is a superpower.

Give it a try. Your Ctrl+F keys will thank you :)

The Era of CPU, GPU, TPU and LPU

Siddhartha Mani — Thu, 30 Oct 2025 15:16:01 +0000

It is 2025, and we have officially entered the age of artificial intelligence. After years of research and development, humans have created AI systems that can help find information faster, automate repetitive tasks, accelerate medical research, and make learning technology more accessible to everyone.

Although we are now in the AI era, the full potential of AI and its impact on the world remain uncertain. Industry analysts anticipate that robots may soon walk the streets, science‑fiction‑style vehicles may appear, and futuristic devices that do not yet exist could become a reality.

When discussing AI, the focus is often on software—the algorithms and models. Equally important is the hardware, the “brain” of AI, which includes the chips that perform the processing behind the scenes. Software intelligence depends entirely on hardware intelligence.

This article highlights the primary types of chips that drive AI: CPU, GPU, TPU, and LPU. Whether the reader is a developer, a technology enthusiast, or simply curious about the future, understanding these chips is essential because they are shaping the technology of tomorrow.

Chip Comparison: CPU, GPU, TPU, LPU

Feature	CPU	GPU	TPU	LPU
Full Name	Central Processing Unit	Graphics Processing Unit	Tensor Processing Unit	Language Processing Unit
Primary Function	General-purpose computing	Parallel processing and graphics	AI/ML acceleration	Language model inference
Architecture	Few complex cores (2–64)	Thousands of simple cores	Matrix multiplication units	Deterministic execution core
Processing Style	Sequential tasks	Massively parallel	Matrix operations	Sequential token processing
Best For AI Training	Poor	Excellent (Industry standard)	Good (Google ecosystem)	Not designed for
Best For AI Inference	Basic models only	Excellent and versatile	Excellent and efficient	Revolutionary for LLMs
Power Consumption	Moderate (15–250W)	Very high (250–700W+)	High but efficient	Highly efficient
Cost (Data Center)	$500 – $10,000+	$10,000 – $40,000+	Cloud service only	Inference API service
Key Strength	Versatility and quick decisions	Massive parallel throughput	AI inference efficiency	Ultra-low latency for text
Key Weakness	Poor parallel performance	High power and cost	Limited flexibility	Hyper-specialized
Memory Approach	Hierarchical cache	GDDR/HBM memory	On-chip memory	Massive single-thread bandwidth
Programming	C++, Python, Java	CUDA, OpenCL	TensorFlow	Groq API
Leading Examples	Intel Core, AMD Ryzen, Apple M-series	NVIDIA A100/H100, RTX series	Google TPU v4/v5	GroqChip LPU-1
Use Case Example	Running operating system, web browser	Training AI models, gaming	Google Search, Translate	AI chatbots, text generation

Which Chip is For You?

Everyday User: You have a CPU (and often a small built-in GPU) in your laptop and phone. It's perfect for web browsing, emails, and documents. You don't need to think about the rest!

Gamer or Video Editor: You need a powerful GPU (like from NVIDIA or AMD) in your computer. It will render your games and videos beautifully.

Company Training a New AI Model: You will buy or rent thousands of GPUs (like NVIDIA's H100). They are the most versatile for the heavy lifting of training.

Company Running a Chatbot Service (like ChatGPT): You have two great choices:

Use GPUs (versatile and powerful).
Use LPUs (if your main goal is raw, unbeatable speed and responsiveness for text generation).

Large-Scale Company like Google: Running its services requires using their own TPUs by the millions to power search, photos, and translate because it's hyper-efficient for their specific, massive-scale needs.

If you do not understand the terms in the table above, do not worry. In my next blog, I will explain each term so that you can understand the technical details. Until then, happy reading.