DEV Community

Michael
Michael

Posted on • Originally published at gbase.cn

GBase 8a Full-Text Index: Features, Queries, and Configuration

The full-text index built into GBase 8a enables indexing and searching across all text-type columns, with support for boolean expressions, proximity searches, and online index updates. This guide covers the feature set, real‑world query examples, and the configuration file that controls every aspect of text processing in a gbase database.

Core Features

  • Index all text-type columns in a table.
  • Queries can run while the index is being built — no downtime required.
  • Incrementally add new data to an existing index with UPDATE INDEX, avoiding full rebuilds:
  UPDATE INDEX index_name ON table_name;
Enter fullscreen mode Exit fullscreen mode

Query Examples

Combine logical operators and the NEAR function to express precise search conditions.

Boolean and Phrase Queries

-- Must contain both "TianJin" AND "ltd"
SELECT * FROM t1 WHERE contains(memo, '"TianJin" & "ltd"');
-- Space defaults to AND
SELECT * FROM t1 WHERE contains(memo, 'TianJin ltd');
-- Contains "张三" OR "TianJin"
SELECT * FROM t1 WHERE contains(memo, '"张三" | "TianJin"');
-- Contains "张三" OR "TianJin" but NOT "人"
SELECT * FROM t1 WHERE contains(memo, '"张三" | "TianJin" - "人"');
Enter fullscreen mode Exit fullscreen mode

NEAR Function: Word Distance and Order

NEAR((term1, term2), num [, order])
Enter fullscreen mode Exit fullscreen mode
  • term: search words separated by commas, treated as AND; each must match exactly.
  • num: maximum word distance (integer), inclusive of the matched terms.
  • order (optional): 0 for any sequence (default), 1 to enforce the specified word order.

Configuring the Index and Tokenizer

The behavior of the full‑text engine is controlled through a configuration file located at:

/opt/gbase/192.168.163.3/gcluster/server/lib/gbase/plugin/gbfti/cfg/
Enter fullscreen mode Exit fullscreen mode

Key Parameters

Parameter Description
multisegmask Tokenization mode: 0 natural (default), 1 numeric n‑gram, 2 English n‑gram
mixedcase Case sensitivity: 0 insensitive (default), 1 sensitive
step N‑gram step: 0 uses default (trigram), >0 sets actual step, max 127
dict Enable dictionary‑based tokenization (requires path)
hitflush Maximum data volume processed per tokenization run
dictSlotPerUnit Dictionary hash bucket count — larger values speed up word lookup at the cost of memory
quickUpdate 0 off (default); 1 enables parallel file writes, suitable for large documents and vocabularies
segThreads Number of tokenizer threads
sortThreads Number of sorting threads
outThreads Number of output threads
maxDocPerUnit Maximum rows per index segment
maxLineSize Maximum text length per row
reduceMemMode 0 keeps index resident in memory (default); 1 flushes to disk to save memory with slightly higher latency
dictDynamicLoad Toggle dynamic dictionary loading
maxMatch Maximum concurrent search operations
maxThreadPerTask Maximum per‑search‑task parallelism
dsoPath Path to the tokenizer shared library
outCharset Character set emitted by the tokenizer

Tuning these parameters lets you strike the right balance between search performance and resource consumption in your gbase database environment, keeping GBASE’s GBase 8a full‑text engine running efficiently.

Top comments (0)