DEV Community

soy
soy

Posted on • Originally published at media.patentllm.org

Reducing Token Consumption in Claude Code — FTS5 Knowledge DB + Tiered Index Design

Problem

If all coding conventions, test commands, and documentation for the entire project are written in CLAUDE.md, a large number of tokens will be consumed every turn. This can put pressure on the LLM's context window and lead to a decrease in quality.

Solution: Two-Tier Structure

This design significantly reduces token consumption with a two-tier structure: "Tier 1 Index" and "Tier 2 FTS5 DB". Tier 1 is a lightweight index of less than 600 tokens, carefully selecting basic project rules and frequently used commands. Tier 2 provides an FTS5-based full-text search DB for deep-dive searches only when necessary.

Extraction Pipeline

We have built a pipeline to automatically extract information from Claude Code execution logs. Code templates and prompt patterns are classified and accumulated using the following procedure.

python3 extract_templates.py --input session_log.jsonl --output templates.json
Enter fullscreen mode Exit fullscreen mode

This script parses session logs and automatically extracts conventions, commands, and test scripts. It performs Japanese full-text search with FTS5 and removes duplicates.

Classification by Local LLM

Nemotron is run locally to perform fast classification with thinking OFF and max_tokens 64 settings. Extracted templates are quickly classified into "Required", "Optional", or "Not Applicable".

response = nemotron_classify(
    input_text=template,
    max_tokens=64,
    thinking=False
)
Enter fullscreen mode Exit fullscreen mode

Tier 1 Selection with Gemini's Large Context

The classified templates are filtered by priority using Gemini's large context. Selection criteria are evaluated on two axes: "Execution Frequency (past 7 days)" and "Project Importance".

from google import genai

client = genai.Client()
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=f"以下のテンプレート一覧から重要度順にTier1を選別してください:\n{tier_list}"
)
Enter fullscreen mode Exit fullscreen mode

Top templates are saved to tier_1_index.json and preferentially referenced during prompt generation for each turn.

Cross-Session Memory with memory.db

Knowledge from multiple sessions is centrally managed using a memory DB (memory.db) leveraging FTS5.

SELECT * FROM rules WHERE rules MATCH 'テストコマンド';
Enter fullscreen mode Exit fullscreen mode

Benefits

  • Significantly reduced token consumption per turn
  • Efficiently search and provide only necessary information
  • Selection pipeline combining Nemotron's local inference and Gemini's large context

Summary

An architecture that efficiently manages basic rules with a Tier 1 index and performs deep-dive searches only when necessary using FTS5 is key to overcoming LLM context window constraints. The "selection → search" pipeline, combining Nemotron's local inference and Gemini's large context, can be immediately implemented in a personal development environment.

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.