Fortune Ndlovu

Posted on May 7

Build Your Own MCP Server: A Repo-Agnostic File Search Tool for AI Assistants

#ai #mcp #python

I often find that the results from AI tools are opinionated. You ask Claude or Cursor to find something in your codebase and it gives you a best guess, or it uses its own heuristics to decide what's relevant. Sometimes it misses files entirely. You could just grep, but let's be honest, piping grep through awk and sorting results across 500 files isn't exactly a great developer experience.

What I wanted was simple: give my AI assistant the ability to actually search my files properly. Not guess. Not summarize. Just find things, accurately, across every file in whatever repo I'm working in.

So I built a local MCP server. It's repo-agnostic; point it at any directory, and it gives Claude Code, Cursor, or any MCP-compatible client full search, scan, read, and write capabilities over that directory. No config changes when you switch repos. It just works.

In this article, I'll walk you through building it yourself.

What You'll Build

A Python MCP server with 7 tools:

Tool	What It Does
`search`	Full-text grep across all files (uses ripgrep for speed)
`scan`	Detect file changes since last scan via git status
`read_file`	Read any file with optional line ranges
`write_file`	Create or overwrite files with atomic writes
`patch_file`	Find-and-replace edits (like Claude's Edit tool)
`list_files`	Glob-based directory listing with filters
`file_stats`	Repo summary: file counts, sizes, git info

Once it's running, you can ask your AI assistant things like "search for authentication across all Python files" and it'll use your MCP server to return exact matches with file paths and line numbers. No guessing.

Prerequisites

Python 3.10+ (I'm using 3.14, any modern version works)
Claude Code and/or Cursor installed
ripgrep (rg) installed for fast search (optional but recommended, falls back to grep)

Install the MCP Python SDK:

python3 -m pip install mcp

Check ripgrep is available (optional):

rg --version

If you don't have ripgrep, the server falls back to system grep, then pure Python. It'll still work, just slower on large repos.

Step 1: Create the Server Directory

Pick a location for your server. I keep mine alongside my other tools, but it can live anywhere:

mkdir -p ~/.local/share/local-files-server
cd ~/.local/share/local-files-server

Create the dependencies file:

echo "mcp>=1.2.0" > requirements.txt
python3 -m pip install -r requirements.txt

Step 2: Write the Server

Create server.py, this is the entire server in one file:

#!/usr/bin/env python3
"""
local-files MCP Server

Repo-agnostic file search, scan, and update server.
Point LOCAL_FILES_ROOT at any directory and go.
"""

import json
import logging
import os
import re
import shutil
import subprocess
import sys
import tempfile
from datetime import datetime, timezone
from pathlib import Path
from typing import Optional

from mcp.server.fastmcp import FastMCP

logging.basicConfig(level=logging.INFO, stream=sys.stderr)
logger = logging.getLogger("local-files")

ROOT = Path(os.environ.get("LOCAL_FILES_ROOT", ".")).resolve()
STATE_FILE = Path(__file__).parent / ".scan-state.json"

IGNORED_DIRS = {".git", "node_modules", "__pycache__", ".venv", "venv", "env", ".obsidian"}
EXTRA_IGNORE = os.environ.get("LOCAL_FILES_IGNORE", "")
if EXTRA_IGNORE:
    IGNORED_DIRS.update(d.strip() for d in EXTRA_IGNORE.split(",") if d.strip())

mcp = FastMCP("local-files")


# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------

def safe_resolve(path_str: str) -> Path:
    """Resolve a path ensuring it stays within ROOT."""
    if os.path.isabs(path_str):
        resolved = Path(path_str).resolve()
    else:
        resolved = (ROOT / path_str).resolve()
    if not str(resolved).startswith(str(ROOT)):
        raise ValueError(f"Path escapes root directory: {path_str}")
    return resolved


def is_binary(filepath: Path, sample: int = 8192) -> bool:
    """Heuristic: file is binary if the first sample bytes contain a null."""
    try:
        with open(filepath, "rb") as f:
            chunk = f.read(sample)
        return b"\x00" in chunk
    except OSError:
        return True


def human_size(nbytes: int) -> str:
    for unit in ("B", "KB", "MB", "GB"):
        if abs(nbytes) < 1024:
            return f"{nbytes:.1f} {unit}"
        nbytes /= 1024
    return f"{nbytes:.1f} TB"


def should_ignore(path: Path) -> bool:
    return any(part in IGNORED_DIRS for part in path.parts)


def walk_files(root: Path, extensions: Optional[list[str]] = None):
    """Yield Path objects for non-ignored, non-binary files under root."""
    for p in root.rglob("*"):
        if not p.is_file():
            continue
        rel = p.relative_to(root)
        if should_ignore(rel):
            continue
        if extensions and p.suffix not in extensions:
            continue
        yield p


def extract_frontmatter(text: str) -> tuple[Optional[str], str]:
    """Split YAML frontmatter from body. Returns (frontmatter_raw, body)."""
    if not text.startswith("---"):
        return None, text
    end = text.find("\n---", 3)
    if end == -1:
        return None, text
    fm = text[3:end].strip()
    body = text[end + 4:].lstrip("\n")
    return fm, body


def git_run(*args: str) -> Optional[str]:
    """Run a git command in ROOT, return stdout or None on failure."""
    try:
        result = subprocess.run(
            ["git", *args],
            cwd=ROOT,
            capture_output=True,
            text=True,
            timeout=10,
        )
        if result.returncode == 0:
            return result.stdout
        return None
    except (FileNotFoundError, subprocess.TimeoutExpired):
        return None


def is_git_repo() -> bool:
    return git_run("rev-parse", "--is-inside-work-tree") is not None


def atomic_write(filepath: Path, content: str) -> None:
    """Write content atomically via temp file + rename."""
    filepath.parent.mkdir(parents=True, exist_ok=True)
    fd, tmp = tempfile.mkstemp(dir=filepath.parent, suffix=".tmp")
    try:
        with os.fdopen(fd, "w") as f:
            f.write(content)
        os.replace(tmp, filepath)
    except BaseException:
        try:
            os.unlink(tmp)
        except OSError:
            pass
        raise


# ---------------------------------------------------------------------------
# Tools
# ---------------------------------------------------------------------------

@mcp.tool()
def search(
    query: str,
    file_extensions: Optional[list[str]] = None,
    max_results: int = 50,
    use_regex: bool = False,
) -> str:
    """Full-text search across all files in the repo.

    Args:
        query: Search string (plain text or regex if use_regex=True)
        file_extensions: Filter by extensions e.g. [".md", ".py"]. None = all files.
        max_results: Maximum matching lines to return (default 50)
        use_regex: Treat query as a regex pattern
    """
    matches = []
    truncated = False

    rg = shutil.which("rg")
    grep = shutil.which("grep")

    if rg or grep:
        cmd: list[str] = []
        if rg:
            cmd = ["rg", "--no-heading", "--line-number", "--color=never",
                   "--with-filename", "-C1", "--max-count=5"]
            if not use_regex:
                cmd.append("--fixed-strings")
            if file_extensions:
                for ext in file_extensions:
                    cmd.extend(["-g", f"*{ext}"])
            for d in IGNORED_DIRS:
                cmd.extend(["-g", f"!{d}/"])
            cmd.extend([query, str(ROOT)])
        else:
            cmd = ["grep", "-rn", "--color=never", "-C1",
                   "-m5", "-I"]
            if not use_regex:
                cmd.append("-F")
            if file_extensions:
                for ext in file_extensions:
                    cmd.extend(["--include", f"*{ext}"])
            for d in IGNORED_DIRS:
                cmd.extend(["--exclude-dir", d])
            cmd.extend([query, str(ROOT)])

        try:
            result = subprocess.run(
                cmd, capture_output=True, text=True, timeout=30,
            )
            raw = result.stdout
        except (subprocess.TimeoutExpired, OSError):
            raw = ""

        current_match: Optional[dict] = None
        for line in raw.splitlines():
            if line == "--":
                if current_match:
                    matches.append(current_match)
                    current_match = None
                    if len(matches) >= max_results:
                        truncated = True
                        break
                continue

            m = re.match(r"^(.+?):(\d+):(.*)$", line)
            if not m:
                m = re.match(r"^(.+?)-(\d+)-(.*)$", line)
                if not m:
                    continue
                is_context = True
            else:
                is_context = False
            filepath, lineno, content = m.group(1), int(m.group(2)), m.group(3)
            try:
                rel = str(Path(filepath).relative_to(ROOT))
            except ValueError:
                rel = filepath

            if not is_context:
                if current_match:
                    matches.append(current_match)
                    if len(matches) >= max_results:
                        truncated = True
                        break
                current_match = {"file": rel, "line": lineno, "content": content}
            elif current_match:
                current_match.setdefault("context", []).append(content)

        if current_match and len(matches) < max_results:
            matches.append(current_match)

    else:
        # pure python fallback
        pattern = re.compile(query if use_regex else re.escape(query), re.IGNORECASE)
        for fp in walk_files(ROOT, file_extensions):
            if is_binary(fp):
                continue
            try:
                lines = fp.read_text(errors="replace").splitlines()
            except OSError:
                continue
            for i, line in enumerate(lines):
                if pattern.search(line):
                    rel = str(fp.relative_to(ROOT))
                    ctx = []
                    if i > 0:
                        ctx.append(lines[i - 1])
                    if i < len(lines) - 1:
                        ctx.append(lines[i + 1])
                    matches.append({"file": rel, "line": i + 1, "content": line, "context": ctx})
                    if len(matches) >= max_results:
                        truncated = True
                        break
            if truncated:
                break

    return json.dumps({"matches": matches, "total_matches": len(matches), "truncated": truncated}, indent=2)


@mcp.tool()
def scan() -> str:
    """Detect file changes since last scan.

    Uses git status for git repos, mtime tracking otherwise.
    Returns new, modified, and deleted files.
    """
    if is_git_repo():
        out = git_run("status", "--porcelain") or ""
        new, modified, deleted = [], [], []
        for line in out.splitlines():
            if len(line) < 4:
                continue
            status = line[:2].strip()
            filepath = line[3:].strip().strip('"')
            if status == "??":
                new.append(filepath)
            elif "D" in status:
                deleted.append(filepath)
            else:
                modified.append(filepath)
        return json.dumps({
            "method": "git",
            "new": new,
            "modified": modified,
            "deleted": deleted,
            "total_changes": len(new) + len(modified) + len(deleted),
        }, indent=2)

    # mtime fallback
    prev_state: dict[str, float] = {}
    if STATE_FILE.exists():
        try:
            prev_state = json.loads(STATE_FILE.read_text())
        except (json.JSONDecodeError, OSError):
            pass

    current_state: dict[str, float] = {}
    new, modified = [], []
    for fp in walk_files(ROOT):
        rel = str(fp.relative_to(ROOT))
        mtime = fp.stat().st_mtime
        current_state[rel] = mtime
        if rel not in prev_state:
            new.append(rel)
        elif prev_state[rel] != mtime:
            modified.append(rel)

    deleted = [p for p in prev_state if p not in current_state]

    STATE_FILE.parent.mkdir(parents=True, exist_ok=True)
    STATE_FILE.write_text(json.dumps(current_state))

    return json.dumps({
        "method": "mtime",
        "new": new,
        "modified": modified,
        "deleted": deleted,
        "total_changes": len(new) + len(modified) + len(deleted),
    }, indent=2)


@mcp.tool()
def read_file(
    path: str,
    offset: Optional[int] = None,
    limit: Optional[int] = None,
) -> str:
    """Read a file's contents.

    Args:
        path: File path (absolute or relative to LOCAL_FILES_ROOT)
        offset: Start from this line number (1-based). Omit for beginning.
        limit: Max lines to read. Omit for entire file (capped at 10000).
    """
    resolved = safe_resolve(path)
    if not resolved.exists():
        return json.dumps({"error": f"File not found: {path}"})
    if not resolved.is_file():
        return json.dumps({"error": f"Not a file: {path}"})
    if is_binary(resolved):
        return json.dumps({"error": f"Binary file, cannot read as text: {path}"})

    try:
        text = resolved.read_text(errors="replace")
    except OSError as e:
        return json.dumps({"error": str(e)})

    lines = text.splitlines(keepends=True)
    total_lines = len(lines)

    start = (offset - 1) if offset and offset > 0 else 0
    end = start + (limit if limit else 10000)
    selected = lines[start:end]
    content = "".join(selected)
    was_truncated = end < total_lines

    result: dict = {
        "path": str(resolved.relative_to(ROOT)),
        "content": content,
        "line_count": total_lines,
        "size_bytes": resolved.stat().st_size,
    }

    if was_truncated:
        result["truncated"] = True
        result["showing_lines"] = f"{start + 1}-{min(end, total_lines)}"

    if resolved.suffix == ".md":
        fm, _ = extract_frontmatter(text)
        if fm:
            result["frontmatter_raw"] = fm

    return json.dumps(result, indent=2)


@mcp.tool()
def write_file(path: str, content: str) -> str:
    """Write content to a file, creating parent directories if needed.

    Args:
        path: File path (absolute or relative to LOCAL_FILES_ROOT)
        content: Full file content to write
    """
    resolved = safe_resolve(path)
    atomic_write(resolved, content)
    return json.dumps({
        "written": str(resolved.relative_to(ROOT)),
        "size_bytes": len(content.encode()),
    })


@mcp.tool()
def patch_file(path: str, old_string: str, new_string: str) -> str:
    """Apply a targeted edit: replace old_string with new_string.

    old_string must appear exactly once in the file.

    Args:
        path: File path (absolute or relative to LOCAL_FILES_ROOT)
        old_string: Exact text to find (must be unique in the file)
        new_string: Replacement text
    """
    resolved = safe_resolve(path)
    if not resolved.exists():
        return json.dumps({"error": f"File not found: {path}"})

    try:
        text = resolved.read_text(errors="replace")
    except OSError as e:
        return json.dumps({"error": str(e)})

    count = text.count(old_string)
    if count == 0:
        return json.dumps({"error": "old_string not found in file"})
    if count > 1:
        return json.dumps({"error": f"old_string appears {count} times; must be unique. Add more context."})

    new_text = text.replace(old_string, new_string, 1)
    atomic_write(resolved, new_text)
    return json.dumps({
        "patched": str(resolved.relative_to(ROOT)),
        "size_bytes": len(new_text.encode()),
    })


@mcp.tool()
def list_files(
    pattern: str = "**/*",
    file_extensions: Optional[list[str]] = None,
    max_results: int = 200,
) -> str:
    """List files matching a glob pattern.

    Args:
        pattern: Glob pattern relative to repo root (default '**/*')
        file_extensions: Filter by extensions e.g. [".md", ".py"]
        max_results: Max files to return (default 200)
    """
    files = []
    truncated = False
    exts = set(file_extensions) if file_extensions else None

    for p in sorted(ROOT.glob(pattern)):
        if not p.is_file():
            continue
        rel = p.relative_to(ROOT)
        if should_ignore(rel):
            continue
        if exts and p.suffix not in exts:
            continue

        try:
            st = p.stat()
            files.append({
                "path": str(rel),
                "size": st.st_size,
                "modified": datetime.fromtimestamp(st.st_mtime, tz=timezone.utc).isoformat(),
            })
        except OSError:
            continue

        if len(files) >= max_results:
            truncated = True
            break

    return json.dumps({"files": files, "total": len(files), "truncated": truncated}, indent=2)


@mcp.tool()
def file_stats() -> str:
    """Get summary statistics about the repo: file counts by extension, total size, git info."""
    by_ext: dict[str, int] = {}
    total_size = 0
    total_files = 0
    latest_file = ""
    latest_mtime = 0.0

    for p in walk_files(ROOT):
        total_files += 1
        try:
            st = p.stat()
        except OSError:
            continue
        total_size += st.st_size
        ext = p.suffix or "(no ext)"
        by_ext[ext] = by_ext.get(ext, 0) + 1
        if st.st_mtime > latest_mtime:
            latest_mtime = st.st_mtime
            latest_file = str(p.relative_to(ROOT))

    result: dict = {
        "total_files": total_files,
        "total_size_bytes": total_size,
        "total_size_human": human_size(total_size),
        "by_extension": dict(sorted(by_ext.items(), key=lambda x: -x[1])),
        "last_modified": {
            "file": latest_file,
            "time": datetime.fromtimestamp(latest_mtime, tz=timezone.utc).isoformat() if latest_mtime else None,
        },
    }

    if is_git_repo():
        branch = (git_run("branch", "--show-current") or "").strip()
        log = (git_run("log", "-1", "--format=%s") or "").strip()
        count_str = (git_run("rev-list", "--count", "HEAD") or "0").strip()
        result["git"] = {
            "branch": branch,
            "last_commit": log,
            "total_commits": int(count_str) if count_str.isdigit() else 0,
        }

    return json.dumps(result, indent=2)


# ---------------------------------------------------------------------------
# Entry point
# ---------------------------------------------------------------------------

if __name__ == "__main__":
    logger.info("local-files MCP server starting (root: %s)", ROOT)
    mcp.run()

That's it. One file, one dependency, 7 tools.

Step 3: Test It

Make sure the server starts without errors:

timeout 3 python3 server.py 2>&1

You should see:

INFO:local-files:local-files MCP server starting (root: /your/current/directory)

It'll hang after that, that's normal. It's waiting for JSON-RPC input over stdin, which is what Claude Code or Cursor will send it.

Step 4: Register in Claude Code

Option A: Global (recommended, works in every project)

Run this from anywhere:

claude mcp add local-files -- python3 /path/to/your/server.py

Replace /path/to/your/server.py with the actual path. For example:

claude mcp add local-files -- python3 ~/.local/share/local-files-server/server.py

This adds the server to your global ~/.claude.json. Every project you open in Claude Code will have access to the 7 tools automatically.

Option B: Per-project only

If you only want it in a specific repo, create or edit .mcp.json in that repo's root:

{
  "mcpServers": {
    "local-files": {
      "type": "stdio",
      "command": "python3",
      "args": ["/path/to/your/server.py"],
      "env": {}
    }
  }
}

Verify it's connected

Restart Claude Code, then run /mcp. You should see:

local-files · connected · 7 tools

What each config file does

File	Scope	Purpose
`~/.claude.json`	Global	MCP servers available in every project
`.mcp.json` (in repo root)	Per-project	MCP servers for this repo only
`.claude/settings.local.json`	Per-project	Permission allowlists for tools

Step 5: Register in Cursor

Open Cursor, go to Settings (gear icon) > Tools & MCP > click "New MCP Server".

Enter:

Name: local-files
Type: command (stdio)
Command: python3 /path/to/your/server.py

Or add it manually to ~/.cursor/mcp.json:

{
  "mcpServers": {
    "local-files": {
      "command": "python3",
      "args": ["/path/to/your/server.py"]
    }
  }
}

After adding, you'll see local-files in the MCP servers list with 7 tools connected. Now when you ask Cursor's AI to search your codebase, it can use your server instead of guessing.

How It Works Under the Hood

The Root Directory

The server uses LOCAL_FILES_ROOT environment variable to know which directory to operate on. If not set, it defaults to the current working directory, which means wherever you open Claude Code or Cursor, that becomes the root automatically. No config changes needed when switching projects.

Search Strategy

The search tool uses a three-tier fallback:

ripgrep (rg) fastest, respects .gitignore
grep, available on every Linux/macOS system
Pure Python, last resort, works everywhere

One gotcha I hit: ripgrep's -I flag means --no-filename (hides file paths in output), which is the opposite of grep's -I (skip binary files). Cost me an hour of debugging. The server uses --with-filename explicitly to avoid this.

Atomic Writes

The write_file and patch_file tools write to a temporary file first, then use os.replace() to atomically swap it into place. This prevents half-written files if something crashes mid-write.

Path Safety

Every tool that accepts a file path runs it through safe_resolve(), which ensures the resolved path stays within the root directory. This prevents directory traversal, you can't pass ../../etc/passwd and read outside the repo.

Change Detection

The scan tool checks if you're in a git repo. If so, it uses git status --porcelain to detect new, modified, and deleted files. If not (maybe you're in a plain directory), it falls back to tracking file modification times in a .scan-state.json file.

Usage Examples

Once connected, just ask naturally:

"search for authentication across all Python files"
"what files changed since last scan"
"show me file stats for this repo"
"list all markdown files in docs/"

Your AI assistant will route these to the MCP server automatically. You'll see Called local-files in the output, confirming it used your server instead of its built-in tools.

Customization

Ignore additional directories

Set the LOCAL_FILES_IGNORE environment variable:

{
  "env": {
    "LOCAL_FILES_IGNORE": "dist,build,.cache"
  }
}

Point at a specific directory

Set LOCAL_FILES_ROOT in the env config:

{
  "env": {
    "LOCAL_FILES_ROOT": "/home/you/specific-project"
  }
}

Wrapping Up

The whole server is ~450 lines of Python, one dependency (mcp), and it works in both Claude Code and Cursor without modification. It defaults to your current directory so you never have to update config when switching repos.

The key insight: AI tools are great at reasoning, but they need accurate data to reason about. Giving them a proper search tool instead of letting them guess makes the results dramatically better. Your AI assistant just got a lot more useful.

Follow me on LinkedIn

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.