DEV Community

loading...

Types of queries that can & can't use a B-Tree index

mervynlee94 profile image Mervyn Lee ・4 min read

Storage engines use B-Tree indexes in various ways, which can affect performance. The general idea of a B-Tree is that all the values are stored in order, and each leaf page is the same distance from the root. The figure below shows an abstract representation of a B-Tree(technically a B+ Tree) index, which corresponds roughly to how InnoDB’s indexes work. MyISAM uses a different structure, but the principles are similar.

Alt Text

A B-Tree index speeds up data access because the storage engine doesn’t have to scan the whole table to find the desired data. The idea of B-Tree is an implementation from Multiway Trees (M-way Tree) with some rules and restrictions on how the tree is built. Binary Search Tree is actually a Multiway Tree with M=2 (2 children). Therefore, you can relate B-Tree with a Binary Search Tree concept. It starts at the root node (not shown in this figure). The slots in the root node hold pointers to child nodes, and the storage engine follows these pointers. It finds the right pointer by looking at the values in the node pages, which define the upper and lower bounds of the values in the child nodes. Eventually, the storage engine either determines that the desired value doesn’t exist or successfully reaches a leaf page.

You can learn more about the concept of B-Tree and how it applies to index in Database here.

Suppose you have the following table:

CREATE TABLE People (
  last_name   varchar(50)     not null,
  first_name  varchar(50)     not null,
  dob         date            not null,
  gender      enum('m', 'f')  not null,
  key(last_name, first_name, dob) );
Enter fullscreen mode Exit fullscreen mode

The index will contain the values from the last_name, first_name, and dob columns for every row in the table. Figure below illustrates how the index arranges the data it stores.

Alt Text

Note that there are two people with the same name but different birth dates, and they’re sorted by birth date.

B-Tree indexes work well for lookups by the full key value, a key range, or a key prefix. They are useful only if the lookup uses a leftmost prefix of the index. The index we showed above will be useful for the following kinds of queries:

  • Match the full value
    A match on the full key value specifies values for all columns in the index. For example, this index can help you find a person named Cuba Allen who was born on 1960-01-01.

  • Match a leftmost prefix
    This index can help you find all people with the last name Allen. This uses only the first column in the index.

  • Match a column prefix
    You can match on the first part of a column’s value. This index can help you find all people whose last names begin with J. This uses only the first column in the index.

  • Match a range of values
    This index can help you find people whose last names are between Allen and Barrymore. This also uses only the first column.

  • Match one part exactly and match a range on another part
    This index can help you find everyone whose last name is Allen and whose first name starts with the letter K (Kim, Karl, etc.). This is an exact match on last_ name and a range query on first_name.

As the tree’s nodes are sorted, they are helpful for both lookups (finding values) and ORDER BY queries (finding values in sorted order). In general, if a B-Tree can help you find a row in a particular way, it can help you sort rows by the same criteria. So, our index will be helpful for ORDER BY clauses that match all the types of lookups we just listed.

Here are some limitations of B-Tree indexes:

  • They are not useful if the lookup does not start from the leftmost side of the indexed columns. For example, this index won’t help you find all people named Bill or all people born on a certain date, because those columns are not leftmost in the index. Likewise, you can’t use the index to find people whose last name ends with a particular letter.

  • You can’t skip columns in the index. That is, you won’t be able to find all people whose last name is Smith and who were born on a particular date. If you don’t specify a value for the first_name column, MySQL can use only the first column of the index.

  • The storage engine can’t optimize accesses with any columns to the right of the first range condition. For example, if your query is WHERE last_name="Smith" AND first_name LIKE 'J%' AND dob='1976-12-23', the index access will use only the first two columns in the index, because the LIKE is a range condition (the server can use the rest of the columns for other purposes, though).

Discussion (0)

pic
Editor guide