Building a RAG System from Scratch — MCP: Exposing pgvector as a Reusable Tool Server

#mcp #rag #python #llm

In the previous article, we built AI Agents that autonomously search our pgvector database. One limitation remained: the tools were hardcoded inside our Python scripts. Only our code could use them.

MCP (Model Context Protocol) fixes this. It turns our search functions into a standalone server that any LLM client can connect to — Claude Desktop, Gemini agents, or any future client.

Tool Use vs MCP

Tool Use (what we built):
  Python script → hardcoded functions → Gemini API
  Reusable by: this script only

MCP Server (what we're building):
  Any LLM client → MCP protocol → our server → pgvector
  Reusable by: Claude Desktop, any agent, any language

The tools themselves don't change. What changes is where they live and how they're accessed.

MCP's Three Primitives

Primitive	Role	Our implementation
Tools	Functions the LLM can call	`search_documents`, `search_by_category`, `list_categories`
Resources	Data the LLM can read	`db://categories` (category list)
Prompts	Reusable prompt templates	`search_prompt(topic)`

Installing FastMCP

pip install fastmcp
pip freeze > requirements.txt

Step 1: MCP Server — `mcp_server/server.py`

# mcp_server/server.py
import psycopg2
from google import genai
from google.genai import types as genai_types
from fastmcp import FastMCP
from dotenv import load_dotenv
import os

load_dotenv()

mcp = FastMCP(
    name="pgvector-search",
    instructions="Document search server using pgvector. "
                 "Covers machine learning, Python, and cloud topics.",
)

gemini_client = genai.Client(api_key=os.getenv("GEMINI_API_KEY"))

conn = psycopg2.connect(
    host=os.getenv("DB_HOST"), port=os.getenv("DB_PORT"),
    dbname=os.getenv("DB_NAME"), user=os.getenv("DB_USER"),
    password=os.getenv("DB_PASSWORD"),
)
cur = conn.cursor()


def get_embedding(text: str) -> list[float]:
    result = gemini_client.models.embed_content(
        model="gemini-embedding-001",
        contents=text,
        config=genai_types.EmbedContentConfig(
            task_type="RETRIEVAL_QUERY",
            output_dimensionality=768,
        ),
    )
    return result.embeddings[0].values


# ── Tools ─────────────────────────────────────────────────────
# The @mcp.tool decorator replaces FunctionDeclaration(...) entirely.
# Type hints + docstrings generate the schema automatically.

@mcp.tool
def search_documents(query: str, top_k: int = 3) -> list[dict]:
    """
    Search all document categories for a given query.
    Use when the category is unknown or the question spans multiple categories.

    Args:
        query: Search query
        top_k: Number of documents to retrieve (default: 3)
    """
    q = get_embedding(query)
    cur.execute("""
        SELECT title, body, category,
               1 - (embedding <=> %s::vector) AS similarity
        FROM documents ORDER BY embedding <=> %s::vector LIMIT %s;
    """, (q, q, top_k))
    return [
        {"title": r[0], "body": r[1], "category": r[2], "similarity": round(r[3], 4)}
        for r in cur.fetchall()
    ]


@mcp.tool
def search_by_category(query: str, category: str, top_k: int = 3) -> list[dict]:
    """
    Search within a specific category (ML, Python, or Cloud).
    Use when the category is explicitly mentioned in the question.

    Args:
        query: Search query
        category: Category name — ML, Python, or Cloud
        top_k: Number of documents to retrieve (default: 3)
    """
    q = get_embedding(query)
    cur.execute("""
        SELECT title, body, category,
               1 - (embedding <=> %s::vector) AS similarity
        FROM documents WHERE category = %s
        ORDER BY embedding <=> %s::vector LIMIT %s;
    """, (q, category, q, top_k))
    return [
        {"title": r[0], "body": r[1], "category": r[2], "similarity": round(r[3], 4)}
        for r in cur.fetchall()
    ]


@mcp.tool
def list_categories() -> list[dict]:
    """
    Return all available categories and their document counts.
    Use this first to understand what data is available.
    """
    cur.execute("""
        SELECT category, COUNT(*) as count
        FROM documents GROUP BY category ORDER BY count DESC;
    """)
    return [{"category": r[0], "count": r[1]} for r in cur.fetchall()]


# ── Resources ─────────────────────────────────────────────────
# Resources are read-only data the LLM can access directly.

@mcp.resource("db://categories")
def get_categories_resource() -> str:
    cur.execute("""
        SELECT category, COUNT(*) as count
        FROM documents GROUP BY category ORDER BY count DESC;
    """)
    lines = [f"- {r[0]}: {r[1]} documents" for r in cur.fetchall()]
    return "Available categories:\n" + "\n".join(lines)


# ── Prompts ───────────────────────────────────────────────────
# Reusable prompt templates.

@mcp.prompt
def search_prompt(topic: str) -> str:
    """Generate a structured search prompt for a given topic."""
    return f"""Research the following topic using the available tools:

Topic: {topic}

Steps:
1. Call list_categories to see what data is available
2. If a relevant category exists, use search_by_category
3. Otherwise use search_documents for a broad search
4. Synthesize the results into a clear answer"""


# ── Entry point ───────────────────────────────────────────────
if __name__ == "__main__":
    mcp.run()  # stdio mode — standard for Claude Desktop

mkdir mcp_server
touch mcp_server/__init__.py

Step 2: Test the Server — `mcp_server/client_test.py`

# mcp_server/client_test.py
import asyncio
from fastmcp import Client

async def test_server():
    async with Client("mcp_server/server.py") as client:

        # List available tools
        tools = await client.list_tools()
        print("=== Available tools ===")
        for tool in tools:
            print(f"  - {tool.name}: {tool.description[:50]}...")

        # List resources
        resources = await client.list_resources()
        print("\n=== Available resources ===")
        for r in resources:
            print(f"  - {r.uri}")

        # Call a tool
        print("\n=== list_categories ===")
        result = await client.call_tool("list_categories", {})
        print(result)

        print("\n=== search_documents ===")
        result = await client.call_tool(
            "search_documents",
            {"query": "ML evaluation metrics", "top_k": 2}
        )
        print(result)

        # Read a resource
        print("\n=== db://categories resource ===")
        content = await client.read_resource("db://categories")
        print(content)

if __name__ == "__main__":
    asyncio.run(test_server())

python mcp_server/client_test.py
# === Available tools ===
#   - search_documents: Search all document categories for a given...
#   - search_by_category: Search within a specific category...
#   - list_categories: Return all available categories...
#
# === list_categories ===
# [{'category': 'ML', 'count': 2}, {'category': 'Cloud', 'count': 2}, ...]

Step 3: Agent via MCP — `12_mcp_agent.py`

The biggest difference: tool definitions come from the server, not from hardcoded FunctionDeclaration objects.

# 12_mcp_agent.py
import asyncio
from google import genai
from google.genai import types
from fastmcp import Client
from dotenv import load_dotenv
import os
import time

load_dotenv()
gemini_client = genai.Client(api_key=os.getenv("GEMINI_API_KEY"))


async def run_agent(task: str):
    print(f"\nTask: {task}")
    print("=" * 60)

    async with Client("mcp_server/server.py") as mcp_client:

        # Fetch tool definitions from the server automatically
        mcp_tools = await mcp_client.list_tools()

        # Convert MCP tool definitions to Gemini format
        gemini_tools = types.Tool(
            function_declarations=[
                types.FunctionDeclaration(
                    name=tool.name,
                    description=tool.description or "",
                    parameters=types.Schema(
                        type=types.Type.OBJECT,
                        properties={
                            name: types.Schema(
                                type=types.Type.STRING
                                if schema.get("type") == "string"
                                else types.Type.INTEGER
                                if schema.get("type") == "integer"
                                else types.Type.STRING,
                                description=schema.get("description", ""),
                            )
                            for name, schema in
                            (tool.inputSchema.get("properties") or {}).items()
                        },
                        required=tool.inputSchema.get("required", []),
                    ),
                )
                for tool in mcp_tools
            ]
        )

        print(f"Loaded {len(mcp_tools)} tools from MCP server")

        contents = [types.Content(role="user", parts=[types.Part(text=task)])]

        for step in range(8):
            print(f"\n[Step {step + 1}]")

            for attempt in range(5):
                try:
                    response = gemini_client.models.generate_content(
                        model="gemini-2.5-flash",
                        contents=contents,
                        config=types.GenerateContentConfig(tools=[gemini_tools]),
                    )
                    break
                except Exception as e:
                    if ("503" in str(e) or "429" in str(e)) and attempt < 4:
                        time.sleep((attempt + 1) * 10)
                    else:
                        raise

            candidates = response.candidates
            if not candidates or not candidates[0].content.parts:
                break

            part = candidates[0].content.parts[0]

            if part.function_call:
                func_name = part.function_call.name
                func_args = dict(part.function_call.args)
                print(f"  → {func_name}({func_args})")

                # Execute via MCP server instead of calling locally
                result = await mcp_client.call_tool(func_name, func_args)
                print(f"  → {len(result) if isinstance(result, list) else result} results")

                contents.append(
                    types.Content(role="model", parts=[types.Part(function_call=part.function_call)])
                )
                contents.append(
                    types.Content(
                        role="user",
                        parts=[types.Part(
                            function_response=types.FunctionResponse(
                                name=func_name,
                                response={"result": result},
                            )
                        )]
                    )
                )
            else:
                text_parts = [
                    p.text for p in candidates[0].content.parts
                    if hasattr(p, 'text') and p.text
                ]
                print(f"\n[Done in {step + 1} steps]")
                return "\n".join(text_parts)

    return "Max steps reached."


async def main():
    result = await run_agent(
        "Check the available categories, then explain ML evaluation metrics in detail."
    )
    print(f"\nFinal answer:\n{result}")

if __name__ == "__main__":
    asyncio.run(main())

python 12_mcp_agent.py
# Loaded 3 tools from MCP server
# [Step 1]
#   → list_categories({})
# [Step 2]
#   → search_by_category({'query': 'evaluation metrics', 'category': 'ML'})
# [Done in 3 steps]

Step 4: Connect to Claude Desktop

If you have Claude Desktop installed, add this to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "pgvector-search": {
      "command": "/path/to/your/project/.venv/bin/python",
      "args": ["/path/to/your/project/mcp_server/server.py"],
      "env": {
        "GEMINI_API_KEY": "AIza...",
        "DB_HOST": "localhost",
        "DB_PORT": "5432",
        "DB_NAME": "vectordb",
        "DB_USER": "postgres",
        "DB_PASSWORD": "password"
      }
    }
  }
}

Restart Claude Desktop. Now you can type "search the pgvector DB for ML evaluation metrics" directly in Claude's chat interface.

Note: Use the full path to your .venv/bin/python, not just python. Claude Desktop doesn't activate virtual environments automatically.

Note: Claude Desktop currently only supports stdio transport, not HTTP. Use server.py (not server_http.py) in the config.

Tool Use vs MCP: The Key Difference

# Tool Use — tools defined in code
tools = types.Tool(function_declarations=[
    types.FunctionDeclaration(name="search_documents", ...)  # handwritten
])
result = search_documents(query)  # called directly

# MCP — tools fetched from server
mcp_tools = await mcp_client.list_tools()    # fetched dynamically
result = await mcp_client.call_tool(name, args)  # executed on server

The tools are identical. The difference is where they live. MCP makes them a shared infrastructure component rather than a per-project implementation.

In the final article of this series, we'll deploy the MCP server to Render and the pgvector database to Supabase — making everything accessible from anywhere.

Full source code: github.com/qameqame/pgvector-tutorial

Top comments (3)

Alex Shev • Jun 28

pgvector as an MCP tool is a nice boundary because retrieval becomes a reusable capability instead of being buried inside one app. I would make the retrieval contract explicit: corpus, filters, scoring, freshness, and why the returned chunks are safe to use for the current task.

Hiroki Kameyama • Jun 28 • Edited

Great point — "retrieval contract" is exactly the right framing.

In this tutorial I kept the interface minimal to focus on the MCP mechanics, but you're right that a production retrieval tool should make its contract explicit. The fields you listed map well to concrete implementation:

corpus: which document collection or tenant is being searched
filters: category, date range, metadata constraints
scoring: similarity threshold, reranking applied or not
freshness: index lag, last-updated timestamp
safety: whether the retrieved chunks passed security validation

Making these explicit in the tool's return schema — rather than leaving them implicit — also makes Evals much more tractable. You can assert on each dimension independently.

I'll incorporate this into a future revision. Thanks for the precise framing.

Alex Shev • Jun 30

That split would make the tool much easier to trust. The eval point is especially important: once corpus, filters, freshness, and safety are separate fields, failures stop looking like vague “bad retrieval” and become diagnosable contract violations.

That is also useful for agents because they can decide when to ask for another retrieval pass instead of blindly reasoning over weak context.