DEV Community

gentic news
gentic news

Posted on • Originally published at gentic.news

Expose pgvector as an MCP Server: From Hardcoded RAG to Reusable Tool Server

Wrap pgvector search in FastMCP to create a reusable MCP server. Any LLM client—including Claude Code—can then query your vector database without hardcoded integrations.

Key Takeaways

  • Wrap pgvector search in FastMCP to create a reusable MCP server.
  • Any LLM client—including Claude Code—can then query your vector database without hardcoded integrations.

What Changed — pgvector Search Becomes an MCP Server

RAG MCP Server tutorial. Model Context Protocol for RAG | by Mehul ...

You've built a RAG system. Your pgvector database is full of embeddings. Your search functions work perfectly—but only inside your Python script. No other tool can touch them.

MCP (Model Context Protocol) breaks that wall. Instead of hardcoding search_documents() inside a single script, you expose it as a standalone server that any LLM client can connect to. Claude Desktop, Claude Code, Gemini agents, or any future MCP-compatible client—they all get access to your vector search with zero integration work.

The source article walks through building exactly this: taking a pgvector-backed search system and wrapping it in FastMCP. The result is a reusable tool server that any MCP client can discover and call.

What It Means For You — Concrete Impact on Claude Code Usage

If you use Claude Code, this is immediately useful. Instead of:

  • Copy-pasting search results into Claude Code
  • Writing custom scripts to query your vector database
  • Maintaining separate integrations for each tool

You run one MCP server. Claude Code connects to it via the MCP protocol. You can then ask Claude Code: "Find documents about transformer architectures" and it calls your pgvector search automatically.

This is the pattern: write once, connect everywhere.

Try It Now — Build Your pgvector MCP Server

1. Install FastMCP

pip install fastmcp
pip freeze > requirements.txt
Enter fullscreen mode Exit fullscreen mode

2. Create the server (mcp_server/server.py)

import psycopg2
from google import genai
from google.genai import types as genai_types
from fastmcp import FastMCP
from dotenv import load_dotenv
import os

load_dotenv()

mcp = FastMCP(
    name="pgvector-search",
    instructions="Document search server using pgvector. "
                 "Covers machine learning, Python, and cloud topics.",
)

gemini_client = genai.Client(api_key=os.getenv("GEMINI_API_KEY"))

conn = psycopg2.connect(
    host=os.getenv("DB_HOST"), port=os.getenv("DB_PORT"),
    dbname=os.getenv("DB_NAME"), user=os.getenv("DB_USER"),
    password=os.getenv("DB_PASSWORD"),
)
cur = conn.cursor()

def get_embedding(text: str) -> list[float]:
    result = gemini_client.models.embed_content(
        model="gemini-embedding-001",
        contents=text,
        config=genai_types.EmbedContentConfig(
            task_type="RETRIEVAL_QUERY",
            output_dimensionality=768,
        ),
    )
    return result.embeddings[0].values

@mcp.tool
def search_documents(query: str, top_k: int = 3) -> list[dict]:
    """
    Search all document categories for a given query.
    Use when the category is unknown or the question spans multiple categories.
    """
    q = get_embedding(query)
    cur.execute("""
        SELECT title, body, category,
               1 - (embedding <=> %s::vector) AS similarity
        FROM documents ORDER BY embedding <=> %s::vector LIMIT %s;
    """, (q, q, top_k))
    return [
        {"title": r[0], "body": r[1], "category": r[2], "similarity": round(r[3], 4)}
        for r in cur.fetchall()
    ]
Enter fullscreen mode Exit fullscreen mode

3. Run the server and connect Claude Code

Start the server:

python mcp_server/server.py
Enter fullscreen mode Exit fullscreen mode

Then configure Claude Code to connect to it. Add to your claude_desktop_config.json or use the --mcp flag:

{
  "mcpServers": {
    "pgvector-search": {
      "command": "python",
      "args": ["mcp_server/server.py"]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Now in Claude Code, you can ask: "Search for documents about attention mechanisms" and it calls your pgvector MCP server automatically.

4. Add Resources and Prompts (Optional)

Resources expose data the LLM can read. Prompts are reusable templates:

@mcp.resource("db://categories")
def get_categories() -> str:
    cur.execute("SELECT DISTINCT category FROM documents ORDER BY category")
    return "\n".join(r[0] for r in cur.fetchall())

@mcp.prompt
def search_prompt(topic: str) -> str:
    return f"Search our document database for information about {topic}. Use the search_documents tool."
Enter fullscreen mode Exit fullscreen mode

Why This Works — Token Economics and Reusability

The magic is in the protocol. MCP standardizes how tools are described and called. FastMCP generates the schema from your Python type hints and docstrings automatically—no manual FunctionDeclaration blocks. This means:

  • Zero schema maintenance: Change a function signature, the schema updates
  • Any client: Claude Desktop, Claude Code, Gemini, or custom agents all speak MCP
  • No code duplication: One server, many consumers

When To Use This

  • You have a pgvector database with embeddings and want Claude Code to query it
  • You're building RAG systems that multiple agents or tools need to access
  • You want to decouple your search logic from your application code
  • You're teaching MCP and want a concrete, working example

The Bigger Picture

This is the start of a journey from classic software engineering into AI engineering. The author of the source article built this as a warm-up project before tackling more advanced MCP courses from Anthropic and Hugging Face. The pattern scales: from pgvector to any data source, from one client to many.

Claude Code users who adopt this pattern stop writing one-off scripts and start building reusable infrastructure. Your vector database becomes a service, not a script dependency.


Source: dev.to


Originally published on gentic.news

Top comments (0)