DEV Community

Michael
Michael

Posted on • Originally published at gbase.cn

Why TEXT and BLOB Don't Support Hash Indexes in GBase 8a

In GBase 8a, TEXT and BLOB columns cannot have Hash indexes. The reason is a fundamental mismatch between the storage characteristics of large objects and the operating principles of Hash indexes.

How Hash Indexes Work

A Hash index computes a hash value (e.g., CRC32) from the indexed column, producing a fixed‑length integer key that points to the actual row. This requires stable, comparable input values that can be hashed quickly — typically columns with controllable length.

Four Conflicts Between TEXT/BLOB and Hash Indexes

  1. Massive, variable‑length data: TEXT can hold up to 10,922 characters; BLOB ranges from 32 KB to 64 MB. Hashing tens of KB or even MB of data consumes enormous I/O and CPU, defeating the purpose of an index.

  2. Fragmented or external storage: Large objects may be split into chunks or stored externally. A Hash index needs contiguous, complete data to guarantee the same hash across different reads. Fragmented storage breaks this stability, causing index misses.

  3. Unpredictable cardinality: Long texts may have high cardinality but costly hashing; short or highly repetitive texts can cause excessive hash collisions, reducing filtering efficiency.

  4. Rarely used with equality queries: TEXT columns are typically used for full‑text search (LIKE '%...%'), and BLOB columns store binary streams. Hash indexes support only equality lookups — not range, prefix, or pattern matching — so they would be created and never used.

What GBase 8a Offers Instead

For TEXT, GBase 8a provides a full‑text search index — a purpose‑built solution that tokenizes text and supports LIKE and MATCH ... AGAINST. BLOB columns are usually left unindexed since they rarely appear in query conditions.

Trying to put a Hash index on a large‑object column is like measuring a football field with a caliper — the tool is completely wrong for the job. GBase 8a's restriction here is both reasonable and necessary, steering users toward the right indexing strategy for each data type in a gbase database.

Top comments (0)