sagar

Posted on Mar 9

How I Built a Claude Code Plugin That Intercepts Grep and Replaces It With Semantic Search

#claudecode #ai #opensource #tutorial

Claude Code plugins shipped a few months ago and the ecosystem is exploding. But most tutorials stop at "here's a slash command that prints hello world." This post walks through building a real, production plugin from scratch: one that indexes your codebase with embeddings, does hybrid semantic search, and automatically intercepts grep calls to give Claude better results.

By the end you'll understand every piece of the Claude Code plugin system: hooks, commands, skills, agents, and how they wire together. The plugin we're building is Beacon, which I open sourced at github.com/sagarmk/beacon-plugin.

The Problem

Claude Code uses grep and ripgrep to find code. This works fine for small projects, but falls apart on larger codebases:

Searching "authentication flow" won't find login_handler, verify_token, or session_middleware
Claude burns context window reading dozens of irrelevant files trying to find the right code
New sessions on a 100k line codebase can eat 50k+ tokens just on exploration

The fix: give Claude a semantic search tool that understands what code does, not just what it's named.

Plugin Anatomy

A Claude Code plugin lives in a directory with this structure:

my-plugin/
├── .claude-plugin/
│   └── plugin.json          # manifest: declares everything
├── hooks/
│   └── hooks.json            # lifecycle event handlers
├── commands/
│   └── my-command.md         # slash commands
├── skills/
│   └── my-skill/
│       └── SKILL.md          # always-loaded context
├── agents/
│   └── my-agent.md           # delegatable sub-agents
├── scripts/
│   └── ...                   # your actual code
└── package.json

The plugin.json manifest ties everything together:

{
  "name": "beacon",
  "version": "1.0.0",
  "description": "Beacon — semantic code search for Claude Code",
  "skills": ["./skills/semantic-search"],
  "commands": [
    "./commands/search-code.md",
    "./commands/index.md",
    "./commands/reindex.md"
  ],
  "agents": ["./agents/code-explorer.md"],
  "hooks": "./hooks/hooks.json"
}

Let's build each piece.

Part 1: Hooks — The Nervous System

Hooks are the most powerful part of the plugin system. They fire on lifecycle events and can run shell commands, modify behavior, or even deny tool calls.

Beacon uses four hook types:

{
  "hooks": {
    "SessionStart": [{
      "matcher": "",
      "hooks": [{
        "type": "command",
        "command": "node ${CLAUDE_PLUGIN_ROOT}/scripts/sync.js",
        "timeout": 300,
        "async": true,
        "description": "Beacon: sync code embeddings index"
      }]
    }],

    "PreToolUse": [{
      "matcher": "Grep",
      "hooks": [{
        "type": "command",
        "command": "node ${CLAUDE_PLUGIN_ROOT}/scripts/grep-intercept.js",
        "timeout": 5,
        "description": "Beacon: intercept grep and redirect to hybrid search"
      }]
    }],

    "PostToolUse": [{
      "matcher": "Write|Edit|MultiEdit",
      "hooks": [{
        "type": "command",
        "command": "node ${CLAUDE_PLUGIN_ROOT}/scripts/embed-file.js \"$TOOL_INPUT_file_path\"",
        "timeout": 15,
        "async": true,
        "description": "Beacon: re-embed changed file"
      }]
    }],

    "PreCompact": [{
      "matcher": "",
      "hooks": [{
        "type": "command",
        "command": "node ${CLAUDE_PLUGIN_ROOT}/scripts/status.js --compact-warning",
        "timeout": 5,
        "description": "Beacon: inject index status before compaction"
      }]
    }]
  }
}

Here's what each does:

SessionStart fires when Claude Code launches. Beacon uses this to sync the embeddings index. On first run it indexes the entire repo. On subsequent sessions it does a diff-based catch-up using git to find changed files. The async: true flag means it runs in the background so it doesn't block the session from starting.

PreToolUse with matcher: "Grep" fires before every grep call. This is where the magic happens: the hook script reads the grep arguments from stdin, decides if Beacon can handle the query better, and if so returns a permissionDecision: "deny" response that redirects Claude to use Beacon search instead. More on this below.

PostToolUse with matcher: "Write|Edit|MultiEdit" fires after any file edit. Beacon re-embeds just that one file so the index stays current without a full resync. The $TOOL_INPUT_file_path variable gives us the exact file that changed.

PreCompact fires before context compaction. Beacon injects its index status into the context so Claude remembers that semantic search is available even after the conversation gets compressed.

The Grep Intercept (PreToolUse in Action)

This is the most interesting hook. When Claude tries to grep, the hook script receives the tool input as JSON on stdin:

const input = JSON.parse(readFileSync('/dev/stdin', 'utf8'));
const pattern = input.tool_input.pattern;
const outputMode = input.tool_input.output_mode;

Then it decides whether to intercept or let grep through. Grep passes through for legitimate use cases:

Regex patterns (function\s+\w+)
Specific file targets (path: "src/lib/db.js")
Count mode (output_mode: "count")
Very short patterns (under 4 chars)
Dotted identifiers (fs.readFileSync)
Content mode (user wants matching lines)

For everything else, the hook checks if the Beacon index is healthy, and if so, denies grep with a redirect message:

const output = {
  hookSpecificOutput: {
    hookEventName: "PreToolUse",
    permissionDecision: "deny",
    additionalContext: `This repo uses Beacon hybrid search. Run:
      node ${pluginRoot}/scripts/search.js "${pattern}"
    `
  }
};
console.log(JSON.stringify(output));

Claude sees the denial and the suggested alternative, then uses Beacon search instead. This is completely transparent to the user.

Part 2: The Search Engine

The search engine has three components: an indexer, a search script, and a SQLite database.

Indexing

The indexer (sync.js) runs on SessionStart and does:

Lists repo files via git (git ls-files)
Reads each file and chunks it (512 tokens with 50 token overlap)
Extracts identifiers from each chunk (function names, class names, imports)
Sends chunk text to an embedding model (Ollama by default, running locally)
Stores the embedding vector, chunk text, file path, line range, and identifiers in SQLite

For incremental syncs, it uses git diff --name-only to find changed files since the last sync and only re-indexes those.

The database uses two SQLite extensions:

sqlite-vec for vector similarity search (cosine distance)
FTS5 for full-text keyword search (BM25 scoring)

Hybrid Search

When you search, three things happen in parallel:

Vector search: your query gets embedded, then sqlite-vec finds the closest chunk vectors by cosine similarity
BM25 search: FTS5 runs a keyword search on chunk text
Identifier boost: exact matches on function/class names get a score bonus

The scores are blended with configurable weights:

{
  "hybrid": {
    "weight_vector": 0.4,
    "weight_bm25": 0.3,
    "weight_rrf": 0.3,
    "identifier_boost": 1.5
  }
}

This hybrid approach is what makes it work well for code. Pure vector search misses exact function names. Pure keyword search misses conceptual matches. The combination catches both.

Fallback Behavior

If the embedding server (Ollama) is down, search falls back to FTS-only mode automatically. If the database is missing or corrupted, the grep intercept lets grep through. The plugin never blocks your workflow.

Part 3: Slash Commands

Commands are markdown files that tell Claude how to use a tool. Here's the /search-code command:

---
description: "Hybrid code search — semantic + keyword + BM25"
argument-hint: <query> [--top-k N] [--threshold F]
allowed-tools: [Bash, Read, Glob]
---

# /search-code

Search using Beacon hybrid search.

## Single query
1. Run `node ${CLAUDE_PLUGIN_ROOT}/scripts/search.js "$ARGUMENTS"`
2. Parse the JSON results
3. Review the `preview` field first
4. Only read source files if preview is insufficient
5. Summarize with file:line citations

The allowed-tools field restricts which Claude Code tools the command can use. The ${CLAUDE_PLUGIN_ROOT} variable resolves to the plugin's install directory.

Commands are the user-facing API. Users type /search-code authentication flow and Claude executes the workflow described in the markdown.

Part 4: Skills — Always-On Context

Skills are markdown files that get loaded into Claude's context at all times. Unlike commands (which are triggered explicitly), skills are always available.

Beacon's skill teaches Claude about the search workflow and the grep intercept behavior:

---
name: semantic-search
description: "Primary code search — hybrid semantic + keyword + BM25"
allowed-tools: [Bash, Read]
---

# Hybrid Code Search (Beacon)

This repo has a Beacon hybrid search index. Beacon is enforced
as the default search — grep is automatically intercepted and
redirected to Beacon for queries it handles better.

## How to search
node ${CLAUDE_PLUGIN_ROOT}/scripts/search.js "<query>"

## Grep intercept behavior
Grep passes through when:
- Pattern has regex metacharacters
- Targets a specific file
- Count mode
- Short patterns (<4 chars)
...

This context means Claude knows about Beacon from the moment a session starts, even if the user never types a slash command.

Part 5: Agents — Delegatable Specialists

Agents are sub-agents that Claude can delegate to. Beacon ships a code-explorer agent:

---
name: code-explorer
description: "Deep codebase exploration using hybrid search"
model: sonnet
tools: [Bash, Read, Glob, Grep]
---

# Code Explorer Agent

You explore codebases using Beacon hybrid search as your
primary tool, supplemented by grep for tracing connections.

## Process
1. Start with hybrid search
2. Read top 3-5 matched files
3. Use grep within those files to trace connections
4. Return explanation with file:line citations

The model: sonnet field means this agent uses a faster model for exploration, saving the main conversation's context for the user's actual task.

Part 6: Configuration

Beacon uses a JSON config file that users can override:

{
  "embedding": {
    "api_base": "http://localhost:11434/v1",
    "model": "nomic-embed-text",
    "dimensions": 768,
    "batch_size": 10
  },
  "search": {
    "top_k": 10,
    "similarity_threshold": 0.35,
    "hybrid": {
      "weight_vector": 0.4,
      "weight_bm25": 0.3,
      "weight_rrf": 0.3,
      "identifier_boost": 1.5
    }
  },
  "storage": {
    "path": ".claude/.beacon"
  }
}

The defaults use Ollama locally (no API keys), but users can switch to OpenAI, Voyage AI, or any OpenAI-compatible endpoint by changing the config.

Installing and Testing

Install the plugin:

claude plugin marketplace add sagarmk/Claude-Code-Beacon-Plugin

Make sure Ollama is running with an embedding model:

ollama pull nomic-embed-text

Start a Claude Code session. You'll see Beacon indexing in the background. Once done, try:

/search-code "authentication flow"

Or just use Claude normally. The grep intercept means Claude will automatically use semantic search when it makes sense.

Results

In my benchmarks across mixed-language codebases:

98.3% retrieval accuracy (vs ~75% for grep alone)
5x faster at finding semantically relevant code
60% reduction in context window usage on codebases over 50k lines

The biggest win is on conceptual queries. Searching "rate limiting" finds throttle_requests, backoff_handler, and 429_response_builder even though none of those strings contain "rate" or "limiting."

Key Takeaways for Plugin Builders

Hooks are the real power. Commands and skills are nice, but hooks let you fundamentally change how Claude Code works. The grep intercept pattern can be adapted for any tool.
PreToolUse can deny and redirect. Return permissionDecision: "deny" with additionalContext to transparently swap tools.
Use async hooks for heavy work. Indexing takes time. The async: true flag keeps the session responsive.
Skills > Commands for always-on behavior. If Claude should always know about your plugin, use a skill. Commands are for explicit user actions.
Fallbacks matter. If your plugin's backend is down, fail gracefully. Never block the user's workflow.