DEV Community

Owada Tomohiro
Owada Tomohiro

Posted on

Introducing Free RAG for Claude Code — Save Tokens & Time

TL;DR

Tired of feeding docs to Claude Code every single time?

With a locally running, free RAG tool (DevRag), Claude Code can find the right documents for you via vector search. You no longer need to remember hundreds of filenames or locations.


Problems When Letting Claude Code Read Documents Directly

1. Wasting context

Claude Code’s context window is limited.

Every time you have it read an entire document, you burn through a huge amount of tokens.

Example:

  • You: “Check the project’s API authentication scheme.”
  • Claude reads docs/auth.md (3,000 tokens)
  • Claude: “We use JWT-based authentication.”

Those 3,000 tokens are now gone from your prompt budget.

Ask something else later → it reads the whole thing again.


2. It’s hard to know which file to look at

As docs accumulate, you don’t know where things are — and neither does Claude.

You: “Tell me about our Redis caching strategy.”
Claude tries:

  • docs/architecture.md (4,000 tokens)
  • docs/caching.md (2,000 tokens)
  • docs/redis.md (doesn’t exist)

But maybe you only needed 200 tokens of docs/caching.md.

In a project with 10–100 documents:

  • You don’t know where others documented things
  • You can’t predict filenames
  • Asking “Where did we write that again?” becomes daily routine

3. Repeated documentation reading

You often refer to the same docs:

Session 1 → docs/auth.md (3,000 tokens)

Session 2 → again (3,000 tokens)

Session 3 → again (3,000 tokens)

Same file, three times, 9,000 tokens.

Because you always read from the beginning, even if you only need a tiny piece.


RAG Solves All of These at Once

How RAG Works

  1. Once at the beginning: vectorize documents and index them
  2. At query time: retrieve only relevant chunks
  3. Claude reads only the necessary parts

Traditional:

Question → Read whole document (3,000 tokens) → Answer
Enter fullscreen mode Exit fullscreen mode

With RAG:

Question → Vector search relevant part (200 tokens) → Answer
Enter fullscreen mode Exit fullscreen mode

This cuts token usage significantly and increases signal-to-noise ratio.

The biggest benefit:

Claude Code can find what you need even if you don’t know filenames.


DevRag — A Simplified RAG for Claude Code

I built DevRag to make context retrieval simpler and faster for Claude Code.

Features

  • One-binary: no external DB, no Python
  • Auto model download on first run
  • MCP integration as a search tool
  • Fast: startup ~2 s, search <100 ms
  • Multilingual support (JP/EN)
  • No vendor lock-in

Setup (~5 minutes)

1. Download binary

# macOS (Apple Silicon)
wget https://github.com/tomohiro-owada/devrag/releases/latest/download/devrag-macos-apple-silicon.tar.gz
tar -xzf devrag-macos-apple-silicon.tar.gz
chmod +x devrag-macos-apple-silicon
sudo mv devrag-macos-apple-silicon /usr/local/bin/devrag
Enter fullscreen mode Exit fullscreen mode

2. Configure Claude Code

Add to ~/.claude.json:

{
  "mcpServers": {
    "devrag": {
      "type": "stdio",
      "command": "/usr/local/bin/devrag"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

3. Add some documents

mkdir documents
cp your-notes.md documents/
Enter fullscreen mode Exit fullscreen mode

DevRag indexes automatically when launched.


Actual Usage Comparison

Before (No RAG)

You: “What’s our DB migration method?”

Claude reads:

  • README.md (5,000 tokens)
  • docs/database.md (4,000 tokens)
  • docs/setup.md (3,000 tokens)

12,000 tokens, ~30 seconds

Because you’re guessing filenames.


After (With DevRag)

You: “What’s our DB migration method?”

Claude:

  • Runs vector search
  • Finds relevant 300-token snippet

Claude:

“Run npm run migrate. For details see docs/database.md:42.”

300 tokens, ~2 seconds


Summary

Directly reading documents means:

  • ❌ Token waste
  • ❌ Hard to find the right file
  • ❌ Repeat full-reads every session

RAG means:

  • ✅ Token usage cut to 1/40
  • ✅ Responses 15× faster
  • ✅ Filename knowledge not required
  • ✅ Setup in ~5 minutes
  • ✅ Entirely local and free

Let Claude Code retrieve what you need automatically using vector search.


Repository

https://github.com/tomohiro-owada/devrag

License: MIT

Feedback: via Issues

Try it out! 🚀

Top comments (0)