Introduction: The Limitations of LIKE Search
When searching for "battery" in PatentLLM's patent database (4 million records), results were slow to appear. A full table scan using LIKE '%battery%' was not practical in terms of speed.
There were three main problems with LIKE search:
- Performance degradation due to full table scans
- Inability to handle variations in word endings (battery / batteries)
- Difficulty with complex searches like "battery OR lithium NOT sodium"
What is FTS5?
FTS5 is a full-text search engine built into SQLite. It pre-builds an inverted index, allowing it to instantly retrieve relevant rows from keywords.
Key benefits:
- Fast searching with an inverted index
- BM25 ranking: Automatic scoring based on relevance to search terms
- Native support for complex searches:
battery OR lithium NOT sodiumcan be written in a single line
Implementation Steps
Building the FTS5 Table
-- FTS5仮想テーブルを作成
CREATE VIRTUAL TABLE cases_fts USING fts5(
raw_case_name,
summary,
analysis_json
);
-- 既存データを投入
INSERT INTO cases_fts SELECT raw_case_name, summary, analysis_json FROM merged_cases;
Comparing Search Queries
-- 旧:LIKE検索(全件スキャン・ランキングなし)
SELECT * FROM merged_cases
WHERE raw_case_name LIKE '%battery%'
OR summary LIKE '%battery%';
-- 新:FTS5検索(インデックス利用・BM25ランキング付き)
SELECT *, rank FROM cases_fts
WHERE cases_fts MATCH 'battery'
ORDER BY rank;
FTS5 eliminates the need for full table scans, allowing searches on 1.73 million records to return instantly.
Query Tuning for MCP Integration
When connecting to PatentLLM's DB via the MCP server from Claude Code to perform patent analysis, we encountered an issue where "0 DB hits for all hypotheses."
Cause: Keyword Design Mismatch
Keywords generated by the Gemini API often became long phrases connected by AND, leading to cases where they didn't match the actual data.
-- 問題例:広すぎるクエリ(汎用語で大量ヒット)
SELECT COUNT(*) FROM cases_fts WHERE cases_fts MATCH 'patent';
-- 問題例:厳しすぎるクエリ(AND連結で0件)
SELECT COUNT(*) FROM cases_fts
WHERE cases_fts MATCH 'retrieval AND augmented AND generation AND pipeline';
Solution: Manual Keyword Prioritization Logic
# Geminiに全て委任する代わりに、手動キーワードを優先
if 'fts_keywords' in hypotheses[hypothesis]:
query = hypotheses[hypothesis]['fts_keywords']
else:
query = analyze_query(hypothesis)
Three-Stage Keyword Design Process
- Measure the number of hits with general keywords (check with
COUNT(*)). - Refine using FTS5's complex search syntax (combining
OR / AND / NOT). - Fine-tune to achieve the target number of results (10-30 items).
Summary
- FTS5 can be immediately introduced with existing SQLite data.
- Achieves millisecond-level search even with 1.73 million records.
- When integrating with MCP, pay attention to keyword granularity. Always validate automatically generated queries against actual data.
Top comments (0)