I often find that the results from AI tools are opinionated. You ask Claude or Cursor to find something in your codebase and it gives you a best guess, or it uses its own heuristics to decide what's relevant. Sometimes it misses files entirely. You could just grep, but let's be honest, piping grep through awk and sorting results across 500 files isn't exactly a great developer experience.
What I wanted was simple: give my AI assistant the ability to actually search my files properly. Not guess. Not summarize. Just find things, accurately, across every file in whatever repo I'm working in.
So I built a local MCP server. It's repo-agnostic; point it at any directory, and it gives Claude Code, Cursor, or any MCP-compatible client full search, scan, read, and write capabilities over that directory. No config changes when you switch repos. It just works.
In this article, I'll walk you through building it yourself.
What You'll Build
A Python MCP server with 7 tools:
| Tool | What It Does |
|---|---|
search |
Full-text grep across all files (uses ripgrep for speed) |
scan |
Detect file changes since last scan via git status |
read_file |
Read any file with optional line ranges |
write_file |
Create or overwrite files with atomic writes |
patch_file |
Find-and-replace edits (like Claude's Edit tool) |
list_files |
Glob-based directory listing with filters |
file_stats |
Repo summary: file counts, sizes, git info |
Once it's running, you can ask your AI assistant things like "search for authentication across all Python files" and it'll use your MCP server to return exact matches with file paths and line numbers. No guessing.
Prerequisites
- Python 3.10+ (I'm using 3.14, any modern version works)
- Claude Code and/or Cursor installed
-
ripgrep (
rg) installed for fast search (optional but recommended, falls back to grep)
Install the MCP Python SDK:
python3 -m pip install mcp
Check ripgrep is available (optional):
rg --version
If you don't have ripgrep, the server falls back to system grep, then pure Python. It'll still work, just slower on large repos.
Step 1: Create the Server Directory
Pick a location for your server. I keep mine alongside my other tools, but it can live anywhere:
mkdir -p ~/.local/share/local-files-server
cd ~/.local/share/local-files-server
Create the dependencies file:
echo "mcp>=1.2.0" > requirements.txt
python3 -m pip install -r requirements.txt
Step 2: Write the Server
Create server.py, this is the entire server in one file:
#!/usr/bin/env python3
"""
local-files MCP Server
Repo-agnostic file search, scan, and update server.
Point LOCAL_FILES_ROOT at any directory and go.
"""
import json
import logging
import os
import re
import shutil
import subprocess
import sys
import tempfile
from datetime import datetime, timezone
from pathlib import Path
from typing import Optional
from mcp.server.fastmcp import FastMCP
logging.basicConfig(level=logging.INFO, stream=sys.stderr)
logger = logging.getLogger("local-files")
ROOT = Path(os.environ.get("LOCAL_FILES_ROOT", ".")).resolve()
STATE_FILE = Path(__file__).parent / ".scan-state.json"
IGNORED_DIRS = {".git", "node_modules", "__pycache__", ".venv", "venv", "env", ".obsidian"}
EXTRA_IGNORE = os.environ.get("LOCAL_FILES_IGNORE", "")
if EXTRA_IGNORE:
IGNORED_DIRS.update(d.strip() for d in EXTRA_IGNORE.split(",") if d.strip())
mcp = FastMCP("local-files")
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def safe_resolve(path_str: str) -> Path:
"""Resolve a path ensuring it stays within ROOT."""
if os.path.isabs(path_str):
resolved = Path(path_str).resolve()
else:
resolved = (ROOT / path_str).resolve()
if not str(resolved).startswith(str(ROOT)):
raise ValueError(f"Path escapes root directory: {path_str}")
return resolved
def is_binary(filepath: Path, sample: int = 8192) -> bool:
"""Heuristic: file is binary if the first sample bytes contain a null."""
try:
with open(filepath, "rb") as f:
chunk = f.read(sample)
return b"\x00" in chunk
except OSError:
return True
def human_size(nbytes: int) -> str:
for unit in ("B", "KB", "MB", "GB"):
if abs(nbytes) < 1024:
return f"{nbytes:.1f} {unit}"
nbytes /= 1024
return f"{nbytes:.1f} TB"
def should_ignore(path: Path) -> bool:
return any(part in IGNORED_DIRS for part in path.parts)
def walk_files(root: Path, extensions: Optional[list[str]] = None):
"""Yield Path objects for non-ignored, non-binary files under root."""
for p in root.rglob("*"):
if not p.is_file():
continue
rel = p.relative_to(root)
if should_ignore(rel):
continue
if extensions and p.suffix not in extensions:
continue
yield p
def extract_frontmatter(text: str) -> tuple[Optional[str], str]:
"""Split YAML frontmatter from body. Returns (frontmatter_raw, body)."""
if not text.startswith("---"):
return None, text
end = text.find("\n---", 3)
if end == -1:
return None, text
fm = text[3:end].strip()
body = text[end + 4:].lstrip("\n")
return fm, body
def git_run(*args: str) -> Optional[str]:
"""Run a git command in ROOT, return stdout or None on failure."""
try:
result = subprocess.run(
["git", *args],
cwd=ROOT,
capture_output=True,
text=True,
timeout=10,
)
if result.returncode == 0:
return result.stdout
return None
except (FileNotFoundError, subprocess.TimeoutExpired):
return None
def is_git_repo() -> bool:
return git_run("rev-parse", "--is-inside-work-tree") is not None
def atomic_write(filepath: Path, content: str) -> None:
"""Write content atomically via temp file + rename."""
filepath.parent.mkdir(parents=True, exist_ok=True)
fd, tmp = tempfile.mkstemp(dir=filepath.parent, suffix=".tmp")
try:
with os.fdopen(fd, "w") as f:
f.write(content)
os.replace(tmp, filepath)
except BaseException:
try:
os.unlink(tmp)
except OSError:
pass
raise
# ---------------------------------------------------------------------------
# Tools
# ---------------------------------------------------------------------------
@mcp.tool()
def search(
query: str,
file_extensions: Optional[list[str]] = None,
max_results: int = 50,
use_regex: bool = False,
) -> str:
"""Full-text search across all files in the repo.
Args:
query: Search string (plain text or regex if use_regex=True)
file_extensions: Filter by extensions e.g. [".md", ".py"]. None = all files.
max_results: Maximum matching lines to return (default 50)
use_regex: Treat query as a regex pattern
"""
matches = []
truncated = False
rg = shutil.which("rg")
grep = shutil.which("grep")
if rg or grep:
cmd: list[str] = []
if rg:
cmd = ["rg", "--no-heading", "--line-number", "--color=never",
"--with-filename", "-C1", "--max-count=5"]
if not use_regex:
cmd.append("--fixed-strings")
if file_extensions:
for ext in file_extensions:
cmd.extend(["-g", f"*{ext}"])
for d in IGNORED_DIRS:
cmd.extend(["-g", f"!{d}/"])
cmd.extend([query, str(ROOT)])
else:
cmd = ["grep", "-rn", "--color=never", "-C1",
"-m5", "-I"]
if not use_regex:
cmd.append("-F")
if file_extensions:
for ext in file_extensions:
cmd.extend(["--include", f"*{ext}"])
for d in IGNORED_DIRS:
cmd.extend(["--exclude-dir", d])
cmd.extend([query, str(ROOT)])
try:
result = subprocess.run(
cmd, capture_output=True, text=True, timeout=30,
)
raw = result.stdout
except (subprocess.TimeoutExpired, OSError):
raw = ""
current_match: Optional[dict] = None
for line in raw.splitlines():
if line == "--":
if current_match:
matches.append(current_match)
current_match = None
if len(matches) >= max_results:
truncated = True
break
continue
m = re.match(r"^(.+?):(\d+):(.*)$", line)
if not m:
m = re.match(r"^(.+?)-(\d+)-(.*)$", line)
if not m:
continue
is_context = True
else:
is_context = False
filepath, lineno, content = m.group(1), int(m.group(2)), m.group(3)
try:
rel = str(Path(filepath).relative_to(ROOT))
except ValueError:
rel = filepath
if not is_context:
if current_match:
matches.append(current_match)
if len(matches) >= max_results:
truncated = True
break
current_match = {"file": rel, "line": lineno, "content": content}
elif current_match:
current_match.setdefault("context", []).append(content)
if current_match and len(matches) < max_results:
matches.append(current_match)
else:
# pure python fallback
pattern = re.compile(query if use_regex else re.escape(query), re.IGNORECASE)
for fp in walk_files(ROOT, file_extensions):
if is_binary(fp):
continue
try:
lines = fp.read_text(errors="replace").splitlines()
except OSError:
continue
for i, line in enumerate(lines):
if pattern.search(line):
rel = str(fp.relative_to(ROOT))
ctx = []
if i > 0:
ctx.append(lines[i - 1])
if i < len(lines) - 1:
ctx.append(lines[i + 1])
matches.append({"file": rel, "line": i + 1, "content": line, "context": ctx})
if len(matches) >= max_results:
truncated = True
break
if truncated:
break
return json.dumps({"matches": matches, "total_matches": len(matches), "truncated": truncated}, indent=2)
@mcp.tool()
def scan() -> str:
"""Detect file changes since last scan.
Uses git status for git repos, mtime tracking otherwise.
Returns new, modified, and deleted files.
"""
if is_git_repo():
out = git_run("status", "--porcelain") or ""
new, modified, deleted = [], [], []
for line in out.splitlines():
if len(line) < 4:
continue
status = line[:2].strip()
filepath = line[3:].strip().strip('"')
if status == "??":
new.append(filepath)
elif "D" in status:
deleted.append(filepath)
else:
modified.append(filepath)
return json.dumps({
"method": "git",
"new": new,
"modified": modified,
"deleted": deleted,
"total_changes": len(new) + len(modified) + len(deleted),
}, indent=2)
# mtime fallback
prev_state: dict[str, float] = {}
if STATE_FILE.exists():
try:
prev_state = json.loads(STATE_FILE.read_text())
except (json.JSONDecodeError, OSError):
pass
current_state: dict[str, float] = {}
new, modified = [], []
for fp in walk_files(ROOT):
rel = str(fp.relative_to(ROOT))
mtime = fp.stat().st_mtime
current_state[rel] = mtime
if rel not in prev_state:
new.append(rel)
elif prev_state[rel] != mtime:
modified.append(rel)
deleted = [p for p in prev_state if p not in current_state]
STATE_FILE.parent.mkdir(parents=True, exist_ok=True)
STATE_FILE.write_text(json.dumps(current_state))
return json.dumps({
"method": "mtime",
"new": new,
"modified": modified,
"deleted": deleted,
"total_changes": len(new) + len(modified) + len(deleted),
}, indent=2)
@mcp.tool()
def read_file(
path: str,
offset: Optional[int] = None,
limit: Optional[int] = None,
) -> str:
"""Read a file's contents.
Args:
path: File path (absolute or relative to LOCAL_FILES_ROOT)
offset: Start from this line number (1-based). Omit for beginning.
limit: Max lines to read. Omit for entire file (capped at 10000).
"""
resolved = safe_resolve(path)
if not resolved.exists():
return json.dumps({"error": f"File not found: {path}"})
if not resolved.is_file():
return json.dumps({"error": f"Not a file: {path}"})
if is_binary(resolved):
return json.dumps({"error": f"Binary file, cannot read as text: {path}"})
try:
text = resolved.read_text(errors="replace")
except OSError as e:
return json.dumps({"error": str(e)})
lines = text.splitlines(keepends=True)
total_lines = len(lines)
start = (offset - 1) if offset and offset > 0 else 0
end = start + (limit if limit else 10000)
selected = lines[start:end]
content = "".join(selected)
was_truncated = end < total_lines
result: dict = {
"path": str(resolved.relative_to(ROOT)),
"content": content,
"line_count": total_lines,
"size_bytes": resolved.stat().st_size,
}
if was_truncated:
result["truncated"] = True
result["showing_lines"] = f"{start + 1}-{min(end, total_lines)}"
if resolved.suffix == ".md":
fm, _ = extract_frontmatter(text)
if fm:
result["frontmatter_raw"] = fm
return json.dumps(result, indent=2)
@mcp.tool()
def write_file(path: str, content: str) -> str:
"""Write content to a file, creating parent directories if needed.
Args:
path: File path (absolute or relative to LOCAL_FILES_ROOT)
content: Full file content to write
"""
resolved = safe_resolve(path)
atomic_write(resolved, content)
return json.dumps({
"written": str(resolved.relative_to(ROOT)),
"size_bytes": len(content.encode()),
})
@mcp.tool()
def patch_file(path: str, old_string: str, new_string: str) -> str:
"""Apply a targeted edit: replace old_string with new_string.
old_string must appear exactly once in the file.
Args:
path: File path (absolute or relative to LOCAL_FILES_ROOT)
old_string: Exact text to find (must be unique in the file)
new_string: Replacement text
"""
resolved = safe_resolve(path)
if not resolved.exists():
return json.dumps({"error": f"File not found: {path}"})
try:
text = resolved.read_text(errors="replace")
except OSError as e:
return json.dumps({"error": str(e)})
count = text.count(old_string)
if count == 0:
return json.dumps({"error": "old_string not found in file"})
if count > 1:
return json.dumps({"error": f"old_string appears {count} times; must be unique. Add more context."})
new_text = text.replace(old_string, new_string, 1)
atomic_write(resolved, new_text)
return json.dumps({
"patched": str(resolved.relative_to(ROOT)),
"size_bytes": len(new_text.encode()),
})
@mcp.tool()
def list_files(
pattern: str = "**/*",
file_extensions: Optional[list[str]] = None,
max_results: int = 200,
) -> str:
"""List files matching a glob pattern.
Args:
pattern: Glob pattern relative to repo root (default '**/*')
file_extensions: Filter by extensions e.g. [".md", ".py"]
max_results: Max files to return (default 200)
"""
files = []
truncated = False
exts = set(file_extensions) if file_extensions else None
for p in sorted(ROOT.glob(pattern)):
if not p.is_file():
continue
rel = p.relative_to(ROOT)
if should_ignore(rel):
continue
if exts and p.suffix not in exts:
continue
try:
st = p.stat()
files.append({
"path": str(rel),
"size": st.st_size,
"modified": datetime.fromtimestamp(st.st_mtime, tz=timezone.utc).isoformat(),
})
except OSError:
continue
if len(files) >= max_results:
truncated = True
break
return json.dumps({"files": files, "total": len(files), "truncated": truncated}, indent=2)
@mcp.tool()
def file_stats() -> str:
"""Get summary statistics about the repo: file counts by extension, total size, git info."""
by_ext: dict[str, int] = {}
total_size = 0
total_files = 0
latest_file = ""
latest_mtime = 0.0
for p in walk_files(ROOT):
total_files += 1
try:
st = p.stat()
except OSError:
continue
total_size += st.st_size
ext = p.suffix or "(no ext)"
by_ext[ext] = by_ext.get(ext, 0) + 1
if st.st_mtime > latest_mtime:
latest_mtime = st.st_mtime
latest_file = str(p.relative_to(ROOT))
result: dict = {
"total_files": total_files,
"total_size_bytes": total_size,
"total_size_human": human_size(total_size),
"by_extension": dict(sorted(by_ext.items(), key=lambda x: -x[1])),
"last_modified": {
"file": latest_file,
"time": datetime.fromtimestamp(latest_mtime, tz=timezone.utc).isoformat() if latest_mtime else None,
},
}
if is_git_repo():
branch = (git_run("branch", "--show-current") or "").strip()
log = (git_run("log", "-1", "--format=%s") or "").strip()
count_str = (git_run("rev-list", "--count", "HEAD") or "0").strip()
result["git"] = {
"branch": branch,
"last_commit": log,
"total_commits": int(count_str) if count_str.isdigit() else 0,
}
return json.dumps(result, indent=2)
# ---------------------------------------------------------------------------
# Entry point
# ---------------------------------------------------------------------------
if __name__ == "__main__":
logger.info("local-files MCP server starting (root: %s)", ROOT)
mcp.run()
That's it. One file, one dependency, 7 tools.
Step 3: Test It
Make sure the server starts without errors:
timeout 3 python3 server.py 2>&1
You should see:
INFO:local-files:local-files MCP server starting (root: /your/current/directory)
It'll hang after that, that's normal. It's waiting for JSON-RPC input over stdin, which is what Claude Code or Cursor will send it.
Step 4: Register in Claude Code
Option A: Global (recommended, works in every project)
Run this from anywhere:
claude mcp add local-files -- python3 /path/to/your/server.py
Replace /path/to/your/server.py with the actual path. For example:
claude mcp add local-files -- python3 ~/.local/share/local-files-server/server.py
This adds the server to your global ~/.claude.json. Every project you open in Claude Code will have access to the 7 tools automatically.
Option B: Per-project only
If you only want it in a specific repo, create or edit .mcp.json in that repo's root:
{
"mcpServers": {
"local-files": {
"type": "stdio",
"command": "python3",
"args": ["/path/to/your/server.py"],
"env": {}
}
}
}
Verify it's connected
Restart Claude Code, then run /mcp. You should see:
local-files ยท connected ยท 7 tools
What each config file does
| File | Scope | Purpose |
|---|---|---|
~/.claude.json |
Global | MCP servers available in every project |
.mcp.json (in repo root) |
Per-project | MCP servers for this repo only |
.claude/settings.local.json |
Per-project | Permission allowlists for tools |
Step 5: Register in Cursor
Open Cursor, go to Settings (gear icon) > Tools & MCP > click "New MCP Server".
Enter:
-
Name:
local-files -
Type:
command(stdio) -
Command:
python3 /path/to/your/server.py
Or add it manually to ~/.cursor/mcp.json:
{
"mcpServers": {
"local-files": {
"command": "python3",
"args": ["/path/to/your/server.py"]
}
}
}
After adding, you'll see local-files in the MCP servers list with 7 tools connected. Now when you ask Cursor's AI to search your codebase, it can use your server instead of guessing.
How It Works Under the Hood
The Root Directory
The server uses LOCAL_FILES_ROOT environment variable to know which directory to operate on. If not set, it defaults to the current working directory, which means wherever you open Claude Code or Cursor, that becomes the root automatically. No config changes needed when switching projects.
Search Strategy
The search tool uses a three-tier fallback:
-
ripgrep (
rg) fastest, respects.gitignore - grep, available on every Linux/macOS system
- Pure Python, last resort, works everywhere
One gotcha I hit: ripgrep's -I flag means --no-filename (hides file paths in output), which is the opposite of grep's -I (skip binary files). Cost me an hour of debugging. The server uses --with-filename explicitly to avoid this.
Atomic Writes
The write_file and patch_file tools write to a temporary file first, then use os.replace() to atomically swap it into place. This prevents half-written files if something crashes mid-write.
Path Safety
Every tool that accepts a file path runs it through safe_resolve(), which ensures the resolved path stays within the root directory. This prevents directory traversal, you can't pass ../../etc/passwd and read outside the repo.
Change Detection
The scan tool checks if you're in a git repo. If so, it uses git status --porcelain to detect new, modified, and deleted files. If not (maybe you're in a plain directory), it falls back to tracking file modification times in a .scan-state.json file.
Usage Examples
Once connected, just ask naturally:
"search for authentication across all Python files"
"what files changed since last scan"
"show me file stats for this repo"
"list all markdown files in docs/"
Your AI assistant will route these to the MCP server automatically. You'll see Called local-files in the output, confirming it used your server instead of its built-in tools.
Customization
Ignore additional directories
Set the LOCAL_FILES_IGNORE environment variable:
{
"env": {
"LOCAL_FILES_IGNORE": "dist,build,.cache"
}
}
Point at a specific directory
Set LOCAL_FILES_ROOT in the env config:
{
"env": {
"LOCAL_FILES_ROOT": "/home/you/specific-project"
}
}
Wrapping Up
The whole server is ~450 lines of Python, one dependency (mcp), and it works in both Claude Code and Cursor without modification. It defaults to your current directory so you never have to update config when switching repos.
The key insight: AI tools are great at reasoning, but they need accurate data to reason about. Giving them a proper search tool instead of letting them guess makes the results dramatically better. Your AI assistant just got a lot more useful.
Follow me on LinkedIn
Top comments (1)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.