Reducing Token Consumption in Claude Code — FTS5 Knowledge DB + Tiered Index Design

#devtools #python #productivity

Problem

If all coding conventions, test commands, and documentation for the entire project are written in CLAUDE.md, a large number of tokens will be consumed every turn. This can put pressure on the LLM's context window and lead to a decrease in quality.

Solution: Two-Tier Structure

This design significantly reduces token consumption with a two-tier structure: "Tier 1 Index" and "Tier 2 FTS5 DB". Tier 1 is a lightweight index of less than 600 tokens, carefully selecting basic project rules and frequently used commands. Tier 2 provides an FTS5-based full-text search DB for deep-dive searches only when necessary.

Extraction Pipeline

We have built a pipeline to automatically extract information from Claude Code execution logs. Code templates and prompt patterns are classified and accumulated using the following procedure.

python3 extract_templates.py --input session_log.jsonl --output templates.json

This script parses session logs and automatically extracts conventions, commands, and test scripts. It performs Japanese full-text search with FTS5 and removes duplicates.

Classification by Local LLM

Nemotron is run locally to perform fast classification with thinking OFF and max_tokens 64 settings. Extracted templates are quickly classified into "Required", "Optional", or "Not Applicable".

response = nemotron_classify(
    input_text=template,
    max_tokens=64,
    thinking=False
)

Tier 1 Selection with Gemini's Large Context

The classified templates are filtered by priority using Gemini's large context. Selection criteria are evaluated on two axes: "Execution Frequency (past 7 days)" and "Project Importance".

from google import genai

client = genai.Client()
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=f"以下のテンプレート一覧から重要度順にTier1を選別してください:\n{tier_list}"
)

Top templates are saved to tier_1_index.json and preferentially referenced during prompt generation for each turn.

Cross-Session Memory with memory.db

Knowledge from multiple sessions is centrally managed using a memory DB (memory.db) leveraging FTS5.

SELECT * FROM rules WHERE rules MATCH 'テストコマンド';

Benefits

Significantly reduced token consumption per turn
Efficiently search and provide only necessary information
Selection pipeline combining Nemotron's local inference and Gemini's large context

Summary

An architecture that efficiently manages basic rules with a Tier 1 index and performs deep-dive searches only when necessary using FTS5 is key to overcoming LLM context window constraints. The "selection → search" pipeline, combining Nemotron's local inference and Gemini's large context, can be immediately implemented in a personal development environment.

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.