TL;DR
Tired of feeding docs to Claude Code every single time?
With a locally running, free RAG tool (DevRag), Claude Code can find the right documents for you via vector search. You no longer need to remember hundreds of filenames or locations.
- Completely free: no API, entirely local
- Simple setup: ~5 minutes
- Fast: token usage cut to 1/40, responses 15× faster
- Repository: https://github.com/tomohiro-owada/devrag
Problems When Letting Claude Code Read Documents Directly
1. Wasting context
Claude Code’s context window is limited.
Every time you have it read an entire document, you burn through a huge amount of tokens.
Example:
- You: “Check the project’s API authentication scheme.”
- Claude reads
docs/auth.md(3,000 tokens) - Claude: “We use JWT-based authentication.”
Those 3,000 tokens are now gone from your prompt budget.
Ask something else later → it reads the whole thing again.
2. It’s hard to know which file to look at
As docs accumulate, you don’t know where things are — and neither does Claude.
You: “Tell me about our Redis caching strategy.”
Claude tries:
-
docs/architecture.md(4,000 tokens) -
docs/caching.md(2,000 tokens) -
docs/redis.md(doesn’t exist)
But maybe you only needed 200 tokens of docs/caching.md.
In a project with 10–100 documents:
- You don’t know where others documented things
- You can’t predict filenames
- Asking “Where did we write that again?” becomes daily routine
3. Repeated documentation reading
You often refer to the same docs:
Session 1 → docs/auth.md (3,000 tokens)
Session 2 → again (3,000 tokens)
Session 3 → again (3,000 tokens)
Same file, three times, 9,000 tokens.
Because you always read from the beginning, even if you only need a tiny piece.
RAG Solves All of These at Once
How RAG Works
- Once at the beginning: vectorize documents and index them
- At query time: retrieve only relevant chunks
- Claude reads only the necessary parts
Traditional:
Question → Read whole document (3,000 tokens) → Answer
With RAG:
Question → Vector search relevant part (200 tokens) → Answer
This cuts token usage significantly and increases signal-to-noise ratio.
The biggest benefit:
Claude Code can find what you need even if you don’t know filenames.
DevRag — A Simplified RAG for Claude Code
I built DevRag to make context retrieval simpler and faster for Claude Code.
Features
- One-binary: no external DB, no Python
- Auto model download on first run
- MCP integration as a
searchtool - Fast: startup ~2 s, search <100 ms
- Multilingual support (JP/EN)
- No vendor lock-in
Setup (~5 minutes)
1. Download binary
# macOS (Apple Silicon)
wget https://github.com/tomohiro-owada/devrag/releases/latest/download/devrag-macos-apple-silicon.tar.gz
tar -xzf devrag-macos-apple-silicon.tar.gz
chmod +x devrag-macos-apple-silicon
sudo mv devrag-macos-apple-silicon /usr/local/bin/devrag
2. Configure Claude Code
Add to ~/.claude.json:
{
"mcpServers": {
"devrag": {
"type": "stdio",
"command": "/usr/local/bin/devrag"
}
}
}
3. Add some documents
mkdir documents
cp your-notes.md documents/
DevRag indexes automatically when launched.
Actual Usage Comparison
Before (No RAG)
You: “What’s our DB migration method?”
Claude reads:
-
README.md(5,000 tokens) -
docs/database.md(4,000 tokens) -
docs/setup.md(3,000 tokens)
→ 12,000 tokens, ~30 seconds
Because you’re guessing filenames.
After (With DevRag)
You: “What’s our DB migration method?”
Claude:
- Runs vector search
- Finds relevant 300-token snippet
Claude:
“Run npm run migrate. For details see docs/database.md:42.”
→ 300 tokens, ~2 seconds
Summary
Directly reading documents means:
- ❌ Token waste
- ❌ Hard to find the right file
- ❌ Repeat full-reads every session
RAG means:
- ✅ Token usage cut to 1/40
- ✅ Responses 15× faster
- ✅ Filename knowledge not required
- ✅ Setup in ~5 minutes
- ✅ Entirely local and free
Let Claude Code retrieve what you need automatically using vector search.
Repository
https://github.com/tomohiro-owada/devrag
License: MIT
Feedback: via Issues
Try it out! 🚀
Top comments (0)