The full-text index built into GBase 8a enables indexing and searching across all text-type columns, with support for boolean expressions, proximity searches, and online index updates. This guide covers the feature set, real‑world query examples, and the configuration file that controls every aspect of text processing in a gbase database.
Core Features
- Index all text-type columns in a table.
- Queries can run while the index is being built — no downtime required.
- Incrementally add new data to an existing index with
UPDATE INDEX, avoiding full rebuilds:
UPDATE INDEX index_name ON table_name;
Query Examples
Combine logical operators and the NEAR function to express precise search conditions.
Boolean and Phrase Queries
-- Must contain both "TianJin" AND "ltd"
SELECT * FROM t1 WHERE contains(memo, '"TianJin" & "ltd"');
-- Space defaults to AND
SELECT * FROM t1 WHERE contains(memo, 'TianJin ltd');
-- Contains "张三" OR "TianJin"
SELECT * FROM t1 WHERE contains(memo, '"张三" | "TianJin"');
-- Contains "张三" OR "TianJin" but NOT "人"
SELECT * FROM t1 WHERE contains(memo, '"张三" | "TianJin" - "人"');
NEAR Function: Word Distance and Order
NEAR((term1, term2), num [, order])
-
term: search words separated by commas, treated as AND; each must match exactly. -
num: maximum word distance (integer), inclusive of the matched terms. -
order(optional): 0 for any sequence (default), 1 to enforce the specified word order.
Configuring the Index and Tokenizer
The behavior of the full‑text engine is controlled through a configuration file located at:
/opt/gbase/192.168.163.3/gcluster/server/lib/gbase/plugin/gbfti/cfg/
Key Parameters
| Parameter | Description |
|---|---|
| multisegmask | Tokenization mode: 0 natural (default), 1 numeric n‑gram, 2 English n‑gram |
| mixedcase | Case sensitivity: 0 insensitive (default), 1 sensitive |
| step | N‑gram step: 0 uses default (trigram), >0 sets actual step, max 127 |
| dict | Enable dictionary‑based tokenization (requires path) |
| hitflush | Maximum data volume processed per tokenization run |
| dictSlotPerUnit | Dictionary hash bucket count — larger values speed up word lookup at the cost of memory |
| quickUpdate | 0 off (default); 1 enables parallel file writes, suitable for large documents and vocabularies |
| segThreads | Number of tokenizer threads |
| sortThreads | Number of sorting threads |
| outThreads | Number of output threads |
| maxDocPerUnit | Maximum rows per index segment |
| maxLineSize | Maximum text length per row |
| reduceMemMode | 0 keeps index resident in memory (default); 1 flushes to disk to save memory with slightly higher latency |
| dictDynamicLoad | Toggle dynamic dictionary loading |
| maxMatch | Maximum concurrent search operations |
| maxThreadPerTask | Maximum per‑search‑task parallelism |
| dsoPath | Path to the tokenizer shared library |
| outCharset | Character set emitted by the tokenizer |
Tuning these parameters lets you strike the right balance between search performance and resource consumption in your gbase database environment, keeping GBASE’s GBase 8a full‑text engine running efficiently.

Top comments (0)