In the previous article, Getting Better LLM Responses Using AI-Friendly Documentation, we explored how creating AI-friendly documentation can improve the quality of responses from Large Language Models (LLMs). We saw how properly structured Angular documentation helped ChatGPT provide more accurate answers about framework features. Let's advance our documentation strategy by implementing our own Retrieval-Augmented Generation (RAG) system, powered by Qdrant and designed to work hand-in-hand with Claude Desktop.
Introduction: Taking RAG from Concept to Implementation
Let's quickly revisit what makes documentation "AI-friendly." Remember those key principles? Clear headers, comprehensive single files, and focused feature-specific content. These organization techniques aren't just theoretical—they make a real difference in how LLMs understand our technical docs.
While the file upload approach we explored previously works reasonably well, it comes with limitations. Most notably, you're constrained by context window sizes, and you need to manually select which documentation to include in each conversation. This creates a frustrating cycle of starting new conversations whenever you need to reference different documentation sections.
Custom RAG systems solve these problems by automatically retrieving just the most relevant documentation fragments for any given question. Think of it as having a research assistant who instantly pulls exactly the right pages from your technical library without you having to specify which books to check.
In this guide, I'll walk you through:
- Setting up your own Qdrant vector database for documentation storage
- Processing and indexing your AI-friendly Angular documentation
- Configuring an MCP server that connects Qdrant to Claude
- Building a RAG pipeline for documentation retrieval
By the end, you'll have a RAG system that significantly enhances Claude's ability to assist with software development.
Understanding RAG Components for Documentation
Before diving into implementation details, let's explore the key components that make up our RAG system.
Vector Embeddings
At the heart of any RAG systems are vector embeddings. These are numerical representations of text that capture semantic meaning in a way that computers can process. When we convert text to vectors, we're essentially mapping the content into a mathematical space where similar concepts cluster together.
For example, the phrases "component initialization" and "component creation" would have vector representations that are close to each other in this space, even though they use different words. This allows us to find relevant documentation based on meaning rather than just keyword matching.
We create these vector embeddings using specialized embedding models, which transform text into these high-dimensional mathematical representations while preserving semantic meaning.
Qdrant is a vector database optimized for storing and searching these embeddings.
MCP: Standardized Tool Communication
The Model Context Protocol (MCP) is a standardized interface that allows external tools like our RAG system to communicate with Claude and other LLMs. Developed by Anthropic, MCP defines how Claude can interact with our custom qdrant-retrieve
server, which in turn connects to our Qdrant vector database.
When you ask a question, the qdrant-retrieve
MCP server:
- Receives the query from Claude
- Translates it into a vector search for Qdrant
- Processes the search results
- Returns the most relevant documentation fragments to Claude
This happens seamlessly in the background, giving Claude access to your entire documentation set without overloading its context window.
Architecture Overview
Our complete solution works like this:
- Documentation Processing Pipeline:
- AI-friendly documentation is divided into semantic chunks
- Each chunk is converted to vector embeddings
- Embeddings are stored in Qdrant collections
-
Retrieval Process:
- User asks a question in Claude Desktop
- Question is sent to the
qdrant-retrieve
MCP server - MCP server converts the question to a vector
- Qdrant retrieves the most similar documentation chunks
- Relevant chunks are sent back to Claude
- Claude generates an answer using the retrieved context
This architecture gives us the flexibility to handle complex questions while maintaining high accuracy and context relevance.
Setting Up Your RAG Environment
Now that we understand the theory, let's get our hands dirty and set up the environment. Don't worry if you're new to these tools — I'll guide you through each step.
Installing and Configuring Qdrant
Qdrant can be run as a Docker container, which makes setup relatively straightforward:
# Download the latest Qdrant image from Dockerhub
docker pull qdrant/qdrant
# Run the service
docker run -p 6333:6333 -p 6334:6334 \
-v "$(pwd)/qdrant_storage:/qdrant/storage:z" \
qdrant/qdrant
This configuration stores all data in the ./qdrant_storage
directory on your machine. Using this shared directory gives you two big advantages: your data persists even if you restart the container, and you can directly access your vector database files during development.
Once Qdrant is running, you'll have access to:
- REST API: http://localhost:6333 for programmatic interactions
- Web UI: http://localhost:6333/dashboard for visually exploring and managing your collections
If you prefer not to run Qdrant locally, you can also use Qdrant Cloud, which offers a free tier with 1GB of storage - more than enough for our documentation use case. Check out the official guides for more detailed setup instructions: Local Quickstart or Cloud Quickstart.
Preparing Angular Documentation for Qdrant with LlamaIndex.TS
Now that we have Qdrant running, let's create our vector collection. In Qdrant, a collection is a named container for related vector embeddings - like a database table but optimized for similarity search. We'll use LlamaIndex.TS, a powerful TypeScript framework that simplifies the entire RAG pipeline from document processing to retrieval.
To create your own collection of Angular documentation embeddings, I've prepared a ready-to-use project that handles all the heavy lifting. Here's how to get started:
# Clone the repository
git clone https://github.com/gergelyszerovay/ai-friendly-docs
cd ai-friendly-docs
# Install dependencies
pnpm install
# Generate Angular documentation and embeddings
pnpm run generate:angular
This script performs three key operations:
- Repository Cloning: Automatically downloads the specific version of Angular documentation from GitHub
- Documentation Generation: Transforms the official Angular docs into AI-friendly formats
- Vector Embedding Creation: Creates and stores vector embeddings in your local Qdrant instance running at http://localhost:6333
The AI-Friendly Documentation project includes a function that builds collections from the Angular documentation. Let's look at how this tool works to understand the underlying process.
The Collection Pipeline
Here's the high-level flow for creating a searchable vector collection from our AI-friendly Angular docs:
async function generateQdrantCollection() {
// 1. Configure your embedding model
Settings.embedModel = new HuggingFaceEmbedding({
modelType: "Xenova/all-MiniLM-L6-v2",
});
// 2. Locate your documentation files
const docsDir = "./ai-friendly-docs/angular-19.2.3/sections";
const files = findMarkdownFiles(docsDir);
// 3. Create the vector collection
await createCollectionFromMarkdownFiles("angular-19.2.3", files);
}
Let's take a closer look at the code that processes our Angular documentation and creates embeddings:
async function createCollectionFromMarkdownFiles(collectionName, files) {
// Initialize parser for smart document chunking
const markdownNodeParser = new MarkdownNodeParser();
let allNodes = [];
// Process each documentation file
for (const filePath of files) {
const text = fs.readFileSync(filePath, "utf-8");
const document = new Document({
text,
metadata: { originalFile: path.basename(filePath) },
});
// Convert document to semantic chunks
const nodes = markdownNodeParser([document]);
allNodes = [...allNodes, ...nodes];
}
// Connect to Qdrant and store the embeddings
const vectorStore = new QdrantVectorStore({
url: "http://localhost:6333",
collectionName,
});
// Create and store the embeddings
return await VectorStoreIndex.fromDocuments(allNodes, {
vectorStores: { TEXT: vectorStore },
});
}
The real power comes from how LlamaIndex processes our documentation. Rather than treating files as monolithic blocks or arbitrarily splitting text by character count, it intelligently chunks the content based on semantic structure.
The MarkdownNodeParser
is what makes this possible. It analyzes the hierarchy of headers in our Markdown files (# for top-level, ## for second-level, etc.) and creates natural boundaries between content sections. For example:
- A section about component inputs under
## Inputs
becomes its own chunk - This chunk is separate from the
## Outputs
section that might follow it - Both chunks maintain metadata connecting them to their parent document
This semantic chunking means when Claude searches for information about "component inputs," it gets precisely the relevant sections, not the entire components guide.
Why This Matters for Retrieval Quality
The quality of document chunking directly impacts how relevant your search results will be. Each chunk in our system:
- Preserves its hierarchical context (parent headers, document source)
- Contains conceptually complete information (not cutting off mid-explanation)
- Has the right granularity for targeted retrieval
When LlamaIndex creates embeddings for these chunks and stores them in Qdrant, it maintains all this contextual information. This means when Claude asks Qdrant for the most similar documents to a query, it gets well-formed, semantically meaningful content – not random text fragments.
This approach works particularly well for technical documentation with consistent header structures, which is exactly what we created in our AI-friendly Angular docs.
Setting Up the MCP Server
The MCP server is the component that connects Claude to Qdrant. For our implementation, we'll use the @gergelyszerovay/mcp-server-qdrant-retrieve
package, which provides a ready-to-use MCP server for Qdrant retrieval integration.
To integrate the MCP server with Claude Desktop, add the following to your claude_desktop_config.json
file:
{
"mcpServers": {
"qdrant": {
"command": "npx",
"args": ["-y", "@gergelyszerovay/mcp-server-qdrant-retrieve"],
"env": {
"QDRANT_API_KEY": "your_api_key_here"
}
}
}
}
Note that for a basic local Qdrant setup without authentication enabled, you can simply use an empty env
object. If you're using Qdrant Cloud or have configured authentication on your local instance, you'll need to add your API key as an environment variable.
The server exposes a qdrant_retrieve
tool that Claude can call to search for relevant documentation fragments based on your questions.
A Quick Test: Your First Retrieval
Before wrapping up, let's do a quick test to make sure everything is working. With your Qdrant server running and MCP configured, ask Claude this simple question about Angular:
Use qdrant_retrieve on angular-19.2.3 collection, retrieve documents matching for the question: "Are standalone components default in Angular?". Retrieve 3 documents.
If everything is set up correctly, Claude should make a request to your MCP server, which then searches Qdrant and returns the most relevant documentation fragments. You should see Claude incorporate information about standalone components becoming the default in Angular 19.0.0.
Note that the first vector database retrieval might be a bit slower, as the MCP server needs to download the required embedding model. Subsequent queries will be much faster.
This quick test confirms that:
- Your Claude Desktop can communicate with the MCP server
- The MCP server can successfully query your Qdrant collection
- The retrieved documents contain meaningful, relevant information
Now that we've confirmed the basic functionality, we're ready to explore more sophisticated retrieval techniques in the next article.
Conclusion and Next Steps
In this first part of our RAG journey, we've laid the groundwork for a powerful documentation retrieval system. We've:
- Explored the key components of a RAG system (vector embeddings, vector databases, and MCP)
- Set up Qdrant for vector storage and understood how to create collections
- Processed AI-friendly Angular documentation into semantically meaningful chunks
- Configured the MCP server to connect Claude Desktop with our vector database
With these fundamentals in place, you now have a functioning RAG system ready to enhance your interactions with Claude. The infrastructure we've built provides a flexible foundation that can work with any documentation set, not just Angular.
In the next part of this series, we'll explore how to effectively use this system, including basic and advanced retrieval techniques that will significantly improve Claude's ability to answer technical questions. We'll walk through real-world examples and develop increasingly sophisticated approaches to get the most accurate and relevant information from our documentation.
Stay tuned for "RAG in Action: Advanced Retrieval Techniques with Claude Desktop" where we'll put this system to work!
About the Author
My name is Gergely Szerovay. I've worked as a data scientist and full-stack developer for many years, and I've been a frontend tech lead focusing on Angular development for the past three years. To closely follow the evolution of AI-assisted software development, I've decided to start building AI tools in public and publish my progress on AIBoosted.dev.
Follow me on Substack (Angular Addicts), Substack (AIBoosted.dev), Medium, Dev.to, X or LinkedIn to learn more about Angular and how to build AI apps with AI, TypeScript, React, and Angular!
Top comments (0)