DEV Community

hinlocaesar
hinlocaesar

Posted on

Full-Text Indexing in MySQL: A Developer's Guide

Introduction

Full-text indexing is a powerful MySQL feature that enables sophisticated text searching capabilities beyond simple pattern matching with LIKE operators. While traditional indexes work well for exact matches and range queries, full-text indexes are specifically designed for searching natural language text, making them essential for applications that need to search through articles, product descriptions, comments, or any substantial text content.

What is Full-Text Indexing?

A full-text index is a special type of index that parses text content into individual words (tokens) and creates an inverted index structure. This allows MySQL to quickly locate documents containing specific words or phrases without scanning every row in the table.

Key Differences from Regular Indexes

  • Regular B-tree indexes: Optimized for exact matches and range queries on structured data
  • Full-text indexes: Optimized for natural language searches, relevance ranking, and word-based queries

Supported Storage Engines

Full-text indexing support varies by storage engine:

  • InnoDB: Supported from MySQL 5.6+, recommended for most use cases
  • MyISAM: Original full-text implementation, legacy support

This article focuses on InnoDB full-text indexes, which are now the standard.

Creating Full-Text Indexes

Syntax

-- During table creation
CREATE TABLE articles (
    id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    title VARCHAR(200),
    body TEXT,
    author VARCHAR(100),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FULLTEXT INDEX ft_title_body (title, body)
) ENGINE=InnoDB;

-- Adding to existing table
ALTER TABLE articles ADD FULLTEXT INDEX ft_content (title, body);

-- Using CREATE INDEX
CREATE FULLTEXT INDEX ft_author ON articles(author);
Enter fullscreen mode Exit fullscreen mode

Supported Column Types

Full-text indexes can only be created on columns with these data types:

  • CHAR
  • VARCHAR
  • TEXT

Performing Full-Text Searches

MySQL provides three full-text search modes, each with different characteristics.

1. Natural Language Mode (Default)

Natural language mode ranks results by relevance, with more relevant matches appearing first.

-- Basic natural language search
SELECT id, title, 
       MATCH(title, body) AGAINST('database performance' IN NATURAL LANGUAGE MODE) AS relevance
FROM articles
WHERE MATCH(title, body) AGAINST('database performance' IN NATURAL LANGUAGE MODE)
ORDER BY relevance DESC;

-- Simplified syntax (IN NATURAL LANGUAGE MODE is default)
SELECT id, title
FROM articles
WHERE MATCH(title, body) AGAINST('database performance')
ORDER BY MATCH(title, body) AGAINST('database performance') DESC;
Enter fullscreen mode Exit fullscreen mode

Key Features:

  • Returns results sorted by relevance score
  • Ignores common words (stopwords)
  • No special operators
  • 50% threshold: words appearing in more than 50% of rows are ignored

2. Boolean Mode

Boolean mode allows complex searches with operators for precise control.

-- Find articles with "mysql" but not "oracle"
SELECT id, title
FROM articles
WHERE MATCH(title, body) AGAINST('+mysql -oracle' IN BOOLEAN MODE);

-- Find articles with "database" and either "performance" or "optimization"
SELECT id, title
FROM articles
WHERE MATCH(title, body) AGAINST('+database +(performance optimization)' IN BOOLEAN MODE);

-- Find exact phrase
SELECT id, title
FROM articles
WHERE MATCH(title, body) AGAINST('"full-text search"' IN BOOLEAN MODE);
Enter fullscreen mode Exit fullscreen mode

Boolean Operators:

Operator Description Example
+ Word must be present +mysql
- Word must not be present -oracle
> Increases word's relevance >performance
< Decreases word's relevance <legacy
() Groups words into subexpressions +(mysql mariadb)
~ Negates word's relevance ~deprecated
* Wildcard (end of word only) optim*
" Exact phrase "database index"

3. Query Expansion Mode

Query expansion performs a search twice: first in natural language mode, then again using the most relevant words from the first search.

SELECT id, title
FROM articles
WHERE MATCH(title, body) AGAINST('database' WITH QUERY EXPANSION);
Enter fullscreen mode Exit fullscreen mode

This is useful when you want to find related documents even if they don't contain the exact search terms.

Understanding Relevance Scoring

The MATCH() AGAINST() function returns a relevance score for each row. Higher scores indicate better matches.

SELECT id, title,
       MATCH(title, body) AGAINST('mysql indexing') AS relevance_score
FROM articles
WHERE MATCH(title, body) AGAINST('mysql indexing')
ORDER BY relevance_score DESC
LIMIT 10;
Enter fullscreen mode Exit fullscreen mode

Factors affecting relevance:

  1. Term frequency: How often the word appears in the document
  2. Inverse document frequency: How rare the word is across all documents
  3. Document length: Shorter documents with matches rank higher
  4. Word proximity: Words closer together increase relevance

Stopwords

MySQL filters out common words called "stopwords" (e.g., "the", "is", "at"). These words appear so frequently that they're not useful for searching.

Viewing Default Stopwords

-- For InnoDB
SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD;
Enter fullscreen mode Exit fullscreen mode

Customizing Stopwords

-- Create custom stopword table
CREATE TABLE custom_stopwords (
    value VARCHAR(30)
) ENGINE=InnoDB;

-- Add custom stopwords
INSERT INTO custom_stopwords VALUES ('custom'), ('words'), ('here');

-- Configure MySQL to use custom stopwords
SET GLOBAL innodb_ft_server_stopword_table = 'mydb/custom_stopwords';

-- Rebuild full-text index for changes to take effect
ALTER TABLE articles DROP INDEX ft_title_body;
ALTER TABLE articles ADD FULLTEXT INDEX ft_title_body (title, body);
Enter fullscreen mode Exit fullscreen mode

Configuration Parameters

Minimum Word Length

By default, MySQL ignores words shorter than 4 characters (InnoDB) or 4 characters (MyISAM).

-- Check current setting
SHOW VARIABLES LIKE 'innodb_ft_min_token_size';

-- To change (requires restart and index rebuild):
-- In my.cnf or my.ini:
-- innodb_ft_min_token_size = 3
Enter fullscreen mode Exit fullscreen mode

After changing this parameter:

  1. Restart MySQL server
  2. Rebuild all full-text indexes
ALTER TABLE articles DROP INDEX ft_title_body;
ALTER TABLE articles ADD FULLTEXT INDEX ft_title_body (title, body);
Enter fullscreen mode Exit fullscreen mode

Other Important Parameters

-- Maximum word length
SHOW VARIABLES LIKE 'innodb_ft_max_token_size';  -- Default: 84

-- Number of words in INFORMATION_SCHEMA
SHOW VARIABLES LIKE 'innodb_ft_result_cache_limit';  -- Default: 2000000000

-- Enable/disable InnoDB full-text search
SHOW VARIABLES LIKE 'innodb_ft_enable_stopword';  -- Default: ON
Enter fullscreen mode Exit fullscreen mode

Practical Examples

E-commerce Product Search

CREATE TABLE products (
    id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(255),
    description TEXT,
    category VARCHAR(100),
    price DECIMAL(10, 2),
    FULLTEXT INDEX ft_product (name, description)
) ENGINE=InnoDB;

-- Search for wireless headphones
SELECT id, name, price,
       MATCH(name, description) AGAINST('wireless headphones' IN BOOLEAN MODE) AS score
FROM products
WHERE MATCH(name, description) AGAINST('wireless headphones' IN BOOLEAN MODE)
  AND price BETWEEN 50 AND 200
ORDER BY score DESC
LIMIT 20;

-- Advanced search: must have "laptop", prefer "gaming", exclude "refurbished"
SELECT id, name, price
FROM products
WHERE MATCH(name, description) AGAINST('+laptop +gaming -refurbished' IN BOOLEAN MODE)
ORDER BY price DESC;
Enter fullscreen mode Exit fullscreen mode

Blog/Article Search

-- Multi-column search with relevance weighting
SELECT 
    id,
    title,
    author,
    created_at,
    -- Title matches weighted more heavily
    (MATCH(title) AGAINST('mysql performance') * 2 +
     MATCH(body) AGAINST('mysql performance')) AS total_relevance
FROM articles
WHERE MATCH(title, body) AGAINST('mysql performance')
ORDER BY total_relevance DESC
LIMIT 10;
Enter fullscreen mode Exit fullscreen mode

Support Ticket Search

CREATE TABLE tickets (
    id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    subject VARCHAR(255),
    description TEXT,
    status ENUM('open', 'in_progress', 'resolved', 'closed'),
    priority ENUM('low', 'medium', 'high', 'urgent'),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FULLTEXT INDEX ft_ticket (subject, description)
) ENGINE=InnoDB;

-- Find open tickets about login issues
SELECT id, subject, priority
FROM tickets
WHERE status = 'open'
  AND MATCH(subject, description) AGAINST('+login +(error problem issue)' IN BOOLEAN MODE)
ORDER BY priority DESC, created_at DESC;
Enter fullscreen mode Exit fullscreen mode

Performance Optimization

Index Multiple Columns Strategically

-- Single multi-column index (recommended for combined searches)
CREATE FULLTEXT INDEX ft_combined ON articles(title, body);

-- Separate indexes (only if you search columns independently)
CREATE FULLTEXT INDEX ft_title ON articles(title);
CREATE FULLTEXT INDEX ft_body ON articles(body);
Enter fullscreen mode Exit fullscreen mode

Use Covering Indexes When Possible

Include frequently accessed columns to avoid additional lookups:

-- Good: Primary key already included
SELECT id, title FROM articles WHERE MATCH(title) AGAINST('mysql');

-- Less optimal: Requires additional column fetch
SELECT id, title, created_at, author FROM articles WHERE MATCH(title) AGAINST('mysql');
Enter fullscreen mode Exit fullscreen mode

Monitor Full-Text Index Size

-- Check index sizes
SELECT 
    TABLE_NAME,
    INDEX_NAME,
    STAT_VALUE * @@innodb_page_size / 1024 / 1024 AS size_mb
FROM mysql.innodb_index_stats
WHERE TABLE_NAME = 'articles'
  AND INDEX_NAME LIKE 'ft_%';
Enter fullscreen mode Exit fullscreen mode

Optimize Table Regularly

-- Optimize table to rebuild full-text index
OPTIMIZE TABLE articles;
Enter fullscreen mode Exit fullscreen mode

Common Pitfalls and Solutions

1. 50% Threshold Issue

Problem: Your search returns no results even though the word exists in the table.

Cause: If a word appears in more than 50% of rows, MySQL ignores it in natural language mode.

Solution: Use Boolean mode instead.

-- Won't work if "mysql" is in >50% of rows
SELECT * FROM articles WHERE MATCH(title) AGAINST('mysql');

-- Will work
SELECT * FROM articles WHERE MATCH(title) AGAINST('mysql' IN BOOLEAN MODE);
Enter fullscreen mode Exit fullscreen mode

2. Minimum Word Length

Problem: Short words are not found.

Solution: Adjust innodb_ft_min_token_size or search for longer terms.

3. Special Characters

Problem: Searches with special characters don't work as expected.

Solution: MySQL tokenizes on word boundaries. Use phrase search:

-- Use exact phrase matching
SELECT * FROM articles WHERE MATCH(body) AGAINST('"C++"' IN BOOLEAN MODE);
Enter fullscreen mode Exit fullscreen mode

4. Case Sensitivity

Full-text searches are case-insensitive by default. This is usually what you want, but be aware:

-- These are equivalent
SELECT * FROM articles WHERE MATCH(title) AGAINST('MySQL');
SELECT * FROM articles WHERE MATCH(title) AGAINST('mysql');
Enter fullscreen mode Exit fullscreen mode

Monitoring and Debugging

View Full-Text Index Information

-- InnoDB full-text index information
SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_TABLE;

-- Full-text index cache
SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE;

-- Configuration
SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_CONFIG;
Enter fullscreen mode Exit fullscreen mode

Explain Query Plans

EXPLAIN SELECT * FROM articles 
WHERE MATCH(title, body) AGAINST('mysql performance' IN BOOLEAN MODE)\G
Enter fullscreen mode Exit fullscreen mode

Best Practices

  1. Use Boolean mode for precise control: Natural language mode is great for general searches, but Boolean mode gives you more control.

  2. Create multi-column indexes wisely: If you always search title and body together, create one index on both columns rather than separate indexes.

  3. Consider your minimum word length: If you need to search short terms, adjust innodb_ft_min_token_size before creating indexes.

  4. Combine with traditional WHERE clauses: Full-text search works well with other filtering conditions.

SELECT * FROM articles
WHERE category = 'tutorial'
  AND created_at >= '2024-01-01'
  AND MATCH(title, body) AGAINST('+mysql +performance' IN BOOLEAN MODE);
Enter fullscreen mode Exit fullscreen mode
  1. Monitor index size: Full-text indexes can be large. Monitor their size and optimize tables periodically.

  2. Test with realistic data: Full-text search behavior changes based on your data distribution. Test with production-like datasets.

  3. Use LIMIT for pagination: Full-text searches can return many results. Always use LIMIT for better performance.

SELECT * FROM articles
WHERE MATCH(title, body) AGAINST('mysql')
ORDER BY MATCH(title, body) AGAINST('mysql') DESC
LIMIT 20 OFFSET 0;
Enter fullscreen mode Exit fullscreen mode

Alternatives and Complementary Solutions

While MySQL full-text indexing is powerful, consider these alternatives for specific use cases:

  • Elasticsearch/OpenSearch: Better for complex search requirements, faceting, and analytics
  • Apache Solr: Enterprise search platform with advanced features
  • Meilisearch: Fast, typo-tolerant search engine
  • Typesense: Modern search engine with good performance
  • PostgreSQL Full-Text Search: If you're using PostgreSQL instead

For simple applications, MySQL full-text indexing is often sufficient and keeps your architecture simpler.

Conclusion

Full-text indexing in MySQL provides robust text search capabilities suitable for many applications. By understanding the different search modes, configuration options, and best practices, you can implement efficient and relevant text search functionality without adding external search engines to your stack.

Key takeaways:

  • Use natural language mode for general relevance-based searches
  • Use Boolean mode when you need precise control with operators
  • Configure minimum word length and stopwords appropriately for your use case
  • Combine full-text search with traditional SQL clauses for powerful filtering
  • Monitor performance and optimize indexes regularly

Start simple with natural language searches, then adopt Boolean mode and advanced features as your requirements grow.

Top comments (0)