Pacharapol Withayasakpunt

Posted on Jul 27, 2020

How does database indexing work?

#explainlikeimfive #database #computerscience

Especially for non-equality comparisons; and sorting.

If it is equality; maybe it is like a sorted hashtable? Not that I know much about how hashtable work...

But what about more than, less than, string startsWith, string endsWith, string contains, string case-insensitive operations? What are possible ways to optimize this (probably on write)?

Top comments (6)

Brandin Chiu • Jul 27 '20 • Edited

In non-technical terms, database indices are structured more or less like associative arrays.

Assuming you have a table called users, where each user is a row. That table would be, by default, saved in a way like this:

[ID] => row

Now let's assume you create an index /we'll ignore index types for now) on the "email" column of your row because you do a lot of searching for users by email. Your database engine will now ALSO store this table like this:

[ID] => row,
[email] => row

When you run your query, the engine will identify possible indexes to match based on its content, and then use that version of the saved table to find results.

If your query has WHERE email="brandinchiu@gmail.com", then it will know to use the email index you created beforehand.

Now this is reasonable for simple indices, but gets more complicated when you start looking at compound keys, and when you start having indices conflict with each other.

This is why properly identifying which indices matter is so important. Every time you create an index, your database gets significantly bigger, relative to the size of the table where the index is. Optimally, you want your database to run in memory, so having ten different indices on a table with a hundred million rows is going to a headache.

Different index types help you tell the engine how to optimize the queries that use them. Integer-based indexes will always be faster, and performing arithmetic against them is going to be relatively trivial.

String indices are possible, but should be limited in scope whenever possible. Indexing something like a "description" column for purposes of searching is not something you should be doing. Instead, you would create a separate table of individual "keywords" and use the foreign key to search, not the actual paragraph text itself.

Pacharapol Withayasakpunt • Jul 27 '20 • Edited

What about pre-segmentation of strings for string contains operators, probably just like how full-text-search in SQLite Virtual Table works?

Currently not even understanding why hashtable is near O(1), but I believe string startsWith operator is just akin to integer greater than?

Different index types

String indices

Now that you mentioned it, how do they work with best performances in relative large datasets?

Also, about String indices, if I don't plan to make unique, it might be possible to optimize, by limiting index size, and looking up multiple times?

Brandin Chiu • Jul 27 '20

A lot of this is going to be engine-dependant, but in almost all cases I can think of, string indices should be avoided for the purpose of searching and/or sorting.

In MYSQL, string operations are just integer operations, but there's a step of converting the strings to integers which degrades performance at scale.

MYSQL supports the FULLTEXT index type which indexes the entire string column. However, the larger the string is, the less performant the search will be unless you're searching for the exact, entire string (which rarely happens and defeats the purpose of a string index in the first place)

Yufan Lou • Jul 27 '20

To my knowledge mainstream relational databases (MySQL, PostgreSQL, SQLite) all use a form of B-tree, because hashtable is not good for comparison. Abstractly you can view B-tree as a sorted set while a hashtable an unordered set. For string operations I suspect they are just implemented as regular expressions, which if you limit them to no look back can be implemented efficiently as finite state machines.

Akash Kava • Jul 29 '20

Same as Table of Index of a book, You look up chapter name and go straight to page where that chapter starts, you don't roll page one by one till the chapter starts.

Pacharapol Withayasakpunt • Jul 29 '20 • Edited

Why is it O(1), yet comparable? How do I implement it in JavaScript / Python (i.e. not using existing libraries)?

The point is, how can I have faster substring indices? How can I have prebuilt indices?

DEV Community

How does database indexing work?

Top comments (6)

Read next

How to Install PostgreSQL on Ubuntu 22.04 LTS

Python Django Financial Trading Dashboard — Integrating AnyChart JS Charts

SQL: ROW_NUMBER, RANK and DENSE_RANK

Linking Data in PostgreSQL Without Explicit Keys: A Practical Guide