Gergely Szerovay for This is Learning

Posted on Mar 26 • Originally published at aiboosted.dev

Building Your Own RAG System: Enhancing Claude with Your Documentation

#ai #llm #angular #mcp

In the previous article, Getting Better LLM Responses Using AI-Friendly Documentation, we explored how creating AI-friendly documentation can improve the quality of responses from Large Language Models (LLMs). We saw how properly structured Angular documentation helped ChatGPT provide more accurate answers about framework features. Let's advance our documentation strategy by implementing our own Retrieval-Augmented Generation (RAG) system, powered by Qdrant and designed to work hand-in-hand with Claude Desktop.

Introduction: Taking RAG from Concept to Implementation

Let's quickly revisit what makes documentation "AI-friendly." Remember those key principles? Clear headers, comprehensive single files, and focused feature-specific content. These organization techniques aren't just theoretical—they make a real difference in how LLMs understand our technical docs.

While the file upload approach we explored previously works reasonably well, it comes with limitations. Most notably, you're constrained by context window sizes, and you need to manually select which documentation to include in each conversation. This creates a frustrating cycle of starting new conversations whenever you need to reference different documentation sections.

Custom RAG systems solve these problems by automatically retrieving just the most relevant documentation fragments for any given question. Think of it as having a research assistant who instantly pulls exactly the right pages from your technical library without you having to specify which books to check.

In this guide, I'll walk you through:

Setting up your own Qdrant vector database for documentation storage
Processing and indexing your AI-friendly Angular documentation
Configuring an MCP server that connects Qdrant to Claude
Building a RAG pipeline for documentation retrieval

By the end, you'll have a RAG system that significantly enhances Claude's ability to assist with software development.

Understanding RAG Components for Documentation

Before diving into implementation details, let's explore the key components that make up our RAG system.

Vector Embeddings

At the heart of any RAG systems are vector embeddings. These are numerical representations of text that capture semantic meaning in a way that computers can process. When we convert text to vectors, we're essentially mapping the content into a mathematical space where similar concepts cluster together.

For example, the phrases "component initialization" and "component creation" would have vector representations that are close to each other in this space, even though they use different words. This allows us to find relevant documentation based on meaning rather than just keyword matching.

We create these vector embeddings using specialized embedding models, which transform text into these high-dimensional mathematical representations while preserving semantic meaning.

Qdrant is a vector database optimized for storing and searching these embeddings.

MCP: Standardized Tool Communication

The Model Context Protocol (MCP) is a standardized interface that allows external tools like our RAG system to communicate with Claude and other LLMs. Developed by Anthropic, MCP defines how Claude can interact with our custom qdrant-retrieve server, which in turn connects to our Qdrant vector database.

When you ask a question, the qdrant-retrieve MCP server:

Receives the query from Claude
Translates it into a vector search for Qdrant
Processes the search results
Returns the most relevant documentation fragments to Claude

This happens seamlessly in the background, giving Claude access to your entire documentation set without overloading its context window.

Architecture Overview

Our complete solution works like this:

Documentation Processing Pipeline:

AI-friendly documentation is divided into semantic chunks
Each chunk is converted to vector embeddings
Embeddings are stored in Qdrant collections

Retrieval Process:
- User asks a question in Claude Desktop
- Question is sent to the qdrant-retrieve MCP server
- MCP server converts the question to a vector
- Qdrant retrieves the most similar documentation chunks
- Relevant chunks are sent back to Claude
- Claude generates an answer using the retrieved context

This architecture gives us the flexibility to handle complex questions while maintaining high accuracy and context relevance.

Setting Up Your RAG Environment

Now that we understand the theory, let's get our hands dirty and set up the environment. Don't worry if you're new to these tools — I'll guide you through each step.

Installing and Configuring Qdrant

Qdrant can be run as a Docker container, which makes setup relatively straightforward:

# Download the latest Qdrant image from Dockerhub
docker pull qdrant/qdrant

# Run the service
docker run -p 6333:6333 -p 6334:6334 \
    -v "$(pwd)/qdrant_storage:/qdrant/storage:z" \
    qdrant/qdrant

This configuration stores all data in the ./qdrant_storage directory on your machine. Using this shared directory gives you two big advantages: your data persists even if you restart the container, and you can directly access your vector database files during development.

Once Qdrant is running, you'll have access to:

REST API: http://localhost:6333 for programmatic interactions
Web UI: http://localhost:6333/dashboard for visually exploring and managing your collections

If you prefer not to run Qdrant locally, you can also use Qdrant Cloud, which offers a free tier with 1GB of storage - more than enough for our documentation use case. Check out the official guides for more detailed setup instructions: Local Quickstart or Cloud Quickstart.

Preparing Angular Documentation for Qdrant with LlamaIndex.TS

Now that we have Qdrant running, let's create our vector collection. In Qdrant, a collection is a named container for related vector embeddings - like a database table but optimized for similarity search. We'll use LlamaIndex.TS, a powerful TypeScript framework that simplifies the entire RAG pipeline from document processing to retrieval.

To create your own collection of Angular documentation embeddings, I've prepared a ready-to-use project that handles all the heavy lifting. Here's how to get started:

# Clone the repository
git clone https://github.com/gergelyszerovay/ai-friendly-docs
cd ai-friendly-docs

# Install dependencies
pnpm install

# Generate Angular documentation and embeddings
pnpm run generate:angular

This script performs three key operations:

Repository Cloning: Automatically downloads the specific version of Angular documentation from GitHub
Documentation Generation: Transforms the official Angular docs into AI-friendly formats
Vector Embedding Creation: Creates and stores vector embeddings in your local Qdrant instance running at http://localhost:6333

The AI-Friendly Documentation project includes a function that builds collections from the Angular documentation. Let's look at how this tool works to understand the underlying process.

The Collection Pipeline

Here's the high-level flow for creating a searchable vector collection from our AI-friendly Angular docs:

async function generateQdrantCollection() {
  // 1. Configure your embedding model
  Settings.embedModel = new HuggingFaceEmbedding({
    modelType: "Xenova/all-MiniLM-L6-v2",
  });

  // 2. Locate your documentation files
  const docsDir = "./ai-friendly-docs/angular-19.2.3/sections";
  const files = findMarkdownFiles(docsDir);

  // 3. Create the vector collection
  await createCollectionFromMarkdownFiles("angular-19.2.3", files);
}

Let's take a closer look at the code that processes our Angular documentation and creates embeddings:

async function createCollectionFromMarkdownFiles(collectionName, files) {
  // Initialize parser for smart document chunking
  const markdownNodeParser = new MarkdownNodeParser();
  let allNodes = [];

  // Process each documentation file
  for (const filePath of files) {
    const text = fs.readFileSync(filePath, "utf-8");
    const document = new Document({
      text,
      metadata: { originalFile: path.basename(filePath) },
    });

    // Convert document to semantic chunks
    const nodes = markdownNodeParser([document]);
    allNodes = [...allNodes, ...nodes];
  }

  // Connect to Qdrant and store the embeddings
  const vectorStore = new QdrantVectorStore({
    url: "http://localhost:6333",
    collectionName,
  });

  // Create and store the embeddings
  return await VectorStoreIndex.fromDocuments(allNodes, {
    vectorStores: { TEXT: vectorStore },
  });
}

The real power comes from how LlamaIndex processes our documentation. Rather than treating files as monolithic blocks or arbitrarily splitting text by character count, it intelligently chunks the content based on semantic structure.

The MarkdownNodeParser is what makes this possible. It analyzes the hierarchy of headers in our Markdown files (# for top-level, ## for second-level, etc.) and creates natural boundaries between content sections. For example:

A section about component inputs under ## Inputs becomes its own chunk
This chunk is separate from the ## Outputs section that might follow it
Both chunks maintain metadata connecting them to their parent document

This semantic chunking means when Claude searches for information about "component inputs," it gets precisely the relevant sections, not the entire components guide.

Why This Matters for Retrieval Quality

The quality of document chunking directly impacts how relevant your search results will be. Each chunk in our system:

Preserves its hierarchical context (parent headers, document source)
Contains conceptually complete information (not cutting off mid-explanation)
Has the right granularity for targeted retrieval

When LlamaIndex creates embeddings for these chunks and stores them in Qdrant, it maintains all this contextual information. This means when Claude asks Qdrant for the most similar documents to a query, it gets well-formed, semantically meaningful content – not random text fragments.

This approach works particularly well for technical documentation with consistent header structures, which is exactly what we created in our AI-friendly Angular docs.

Setting Up the MCP Server

The MCP server is the component that connects Claude to Qdrant. For our implementation, we'll use the @gergelyszerovay/mcp-server-qdrant-retrieve package, which provides a ready-to-use MCP server for Qdrant retrieval integration.

To integrate the MCP server with Claude Desktop, add the following to your claude_desktop_config.json file:

{
  "mcpServers": {
    "qdrant": {
      "command": "npx",
      "args": ["-y", "@gergelyszerovay/mcp-server-qdrant-retrieve"],
      "env": {
        "QDRANT_API_KEY": "your_api_key_here"
      }
    }
  }
}

Note that for a basic local Qdrant setup without authentication enabled, you can simply use an empty env object. If you're using Qdrant Cloud or have configured authentication on your local instance, you'll need to add your API key as an environment variable.

The server exposes a qdrant_retrieve tool that Claude can call to search for relevant documentation fragments based on your questions.

A Quick Test: Your First Retrieval

Before wrapping up, let's do a quick test to make sure everything is working. With your Qdrant server running and MCP configured, ask Claude this simple question about Angular:

Use qdrant_retrieve on angular-19.2.3 collection, retrieve documents matching for the question: "Are standalone components default in Angular?". Retrieve 3 documents.

If everything is set up correctly, Claude should make a request to your MCP server, which then searches Qdrant and returns the most relevant documentation fragments. You should see Claude incorporate information about standalone components becoming the default in Angular 19.0.0.

Note that the first vector database retrieval might be a bit slower, as the MCP server needs to download the required embedding model. Subsequent queries will be much faster.

This quick test confirms that:

Your Claude Desktop can communicate with the MCP server
The MCP server can successfully query your Qdrant collection
The retrieved documents contain meaningful, relevant information

Now that we've confirmed the basic functionality, we're ready to explore more sophisticated retrieval techniques in the next article.

Conclusion and Next Steps

In this first part of our RAG journey, we've laid the groundwork for a powerful documentation retrieval system. We've:

Explored the key components of a RAG system (vector embeddings, vector databases, and MCP)
Set up Qdrant for vector storage and understood how to create collections
Processed AI-friendly Angular documentation into semantically meaningful chunks
Configured the MCP server to connect Claude Desktop with our vector database

With these fundamentals in place, you now have a functioning RAG system ready to enhance your interactions with Claude. The infrastructure we've built provides a flexible foundation that can work with any documentation set, not just Angular.

In the next part of this series, we'll explore how to effectively use this system, including basic and advanced retrieval techniques that will significantly improve Claude's ability to answer technical questions. We'll walk through real-world examples and develop increasingly sophisticated approaches to get the most accurate and relevant information from our documentation.

Stay tuned for "RAG in Action: Advanced Retrieval Techniques with Claude Desktop" where we'll put this system to work!

About the Author

My name is Gergely Szerovay. I've worked as a data scientist and full-stack developer for many years, and I've been a frontend tech lead focusing on Angular development for the past three years. To closely follow the evolution of AI-assisted software development, I've decided to start building AI tools in public and publish my progress on AIBoosted.dev.

Follow me on Substack (Angular Addicts), Substack (AIBoosted.dev), Medium, Dev.to, X or LinkedIn to learn more about Angular and how to build AI apps with AI, TypeScript, React, and Angular!

Get n8n VPS hosting 3x cheaper than a cloud solution

Get fast, easy, secure n8n VPS hosting from $4.99/mo at Hostinger. Automate any workflow using a pre-installed n8n application and no-code customization.

Start now

DEV Community