Owada Tomohiro

Posted on Oct 25

Introducing Free RAG for Claude Code — Save Tokens & Time

#tooling #llm #opensource #productivity

TL;DR

Tired of feeding docs to Claude Code every single time?

With a locally running, free RAG tool (DevRag), Claude Code can find the right documents for you via vector search. You no longer need to remember hundreds of filenames or locations.

Completely free: no API, entirely local
Simple setup: ~5 minutes
Fast: token usage cut to 1/40, responses 15× faster
Repository: https://github.com/tomohiro-owada/devrag

Problems When Letting Claude Code Read Documents Directly

1. Wasting context

Claude Code’s context window is limited.

Every time you have it read an entire document, you burn through a huge amount of tokens.

Example:

You: “Check the project’s API authentication scheme.”
Claude reads docs/auth.md (3,000 tokens)
Claude: “We use JWT-based authentication.”

Those 3,000 tokens are now gone from your prompt budget.

Ask something else later → it reads the whole thing again.

2. It’s hard to know which file to look at

As docs accumulate, you don’t know where things are — and neither does Claude.

You: “Tell me about our Redis caching strategy.”
Claude tries:

docs/architecture.md (4,000 tokens)
docs/caching.md (2,000 tokens)
docs/redis.md (doesn’t exist)

But maybe you only needed 200 tokens of docs/caching.md.

In a project with 10–100 documents:

You don’t know where others documented things
You can’t predict filenames
Asking “Where did we write that again?” becomes daily routine

3. Repeated documentation reading

You often refer to the same docs:

Session 1 → docs/auth.md (3,000 tokens)

Session 2 → again (3,000 tokens)

Session 3 → again (3,000 tokens)

Same file, three times, 9,000 tokens.

Because you always read from the beginning, even if you only need a tiny piece.

RAG Solves All of These at Once

How RAG Works

Once at the beginning: vectorize documents and index them
At query time: retrieve only relevant chunks
Claude reads only the necessary parts

Traditional:

Question → Read whole document (3,000 tokens) → Answer

With RAG:

Question → Vector search relevant part (200 tokens) → Answer

This cuts token usage significantly and increases signal-to-noise ratio.

The biggest benefit:

Claude Code can find what you need even if you don’t know filenames.

DevRag — A Simplified RAG for Claude Code

I built DevRag to make context retrieval simpler and faster for Claude Code.

Features

One-binary: no external DB, no Python
Auto model download on first run
MCP integration as a search tool
Fast: startup ~2 s, search <100 ms
Multilingual support (JP/EN)
No vendor lock-in

Setup (~5 minutes)

1. Download binary

# macOS (Apple Silicon)
wget https://github.com/tomohiro-owada/devrag/releases/latest/download/devrag-macos-apple-silicon.tar.gz
tar -xzf devrag-macos-apple-silicon.tar.gz
chmod +x devrag-macos-apple-silicon
sudo mv devrag-macos-apple-silicon /usr/local/bin/devrag

2. Configure Claude Code

Add to ~/.claude.json:

{
  "mcpServers": {
    "devrag": {
      "type": "stdio",
      "command": "/usr/local/bin/devrag"
    }
  }
}

3. Add some documents

mkdir documents
cp your-notes.md documents/

DevRag indexes automatically when launched.

Actual Usage Comparison

Before (No RAG)

You: “What’s our DB migration method?”

Claude reads:

README.md (5,000 tokens)
docs/database.md (4,000 tokens)
docs/setup.md (3,000 tokens)

→ 12,000 tokens, ~30 seconds

Because you’re guessing filenames.

After (With DevRag)

You: “What’s our DB migration method?”

Claude:

Runs vector search
Finds relevant 300-token snippet

Claude:

“Run npm run migrate. For details see docs/database.md:42.”

→ 300 tokens, ~2 seconds

Summary

Directly reading documents means:

❌ Token waste
❌ Hard to find the right file
❌ Repeat full-reads every session

RAG means:

✅ Token usage cut to 1/40
✅ Responses 15× faster
✅ Filename knowledge not required
✅ Setup in ~5 minutes
✅ Entirely local and free

Let Claude Code retrieve what you need automatically using vector search.

Repository

https://github.com/tomohiro-owada/devrag

License: MIT

Feedback: via Issues

Try it out! 🚀

DEV Community