Discussion on: How does database indexing work?

View post

In non-technical terms, database indices are structured more or less like associative arrays.

Assuming you have a table called users, where each user is a row. That table would be, by default, saved in a way like this:

[ID] => row

Now let's assume you create an index /we'll ignore index types for now) on the "email" column of your row because you do a lot of searching for users by email. Your database engine will now ALSO store this table like this:

[ID] => row,
[email] => row

When you run your query, the engine will identify possible indexes to match based on its content, and then use that version of the saved table to find results.

If your query has WHERE email="brandinchiu@gmail.com", then it will know to use the email index you created beforehand.

Now this is reasonable for simple indices, but gets more complicated when you start looking at compound keys, and when you start having indices conflict with each other.

This is why properly identifying which indices matter is so important. Every time you create an index, your database gets significantly bigger, relative to the size of the table where the index is. Optimally, you want your database to run in memory, so having ten different indices on a table with a hundred million rows is going to a headache.

Different index types help you tell the engine how to optimize the queries that use them. Integer-based indexes will always be faster, and performing arithmetic against them is going to be relatively trivial.

String indices are possible, but should be limited in scope whenever possible. Indexing something like a "description" column for purposes of searching is not something you should be doing. Instead, you would create a separate table of individual "keywords" and use the foreign key to search, not the actual paragraph text itself.

Pacharapol Withayasakpunt • Jul 27 '20 • Edited

What about pre-segmentation of strings for string contains operators, probably just like how full-text-search in SQLite Virtual Table works?

Currently not even understanding why hashtable is near O(1), but I believe string startsWith operator is just akin to integer greater than?

Different index types

String indices

Now that you mentioned it, how do they work with best performances in relative large datasets?

Also, about String indices, if I don't plan to make unique, it might be possible to optimize, by limiting index size, and looking up multiple times?

Brandin Chiu • Jul 27 '20

A lot of this is going to be engine-dependant, but in almost all cases I can think of, string indices should be avoided for the purpose of searching and/or sorting.

In MYSQL, string operations are just integer operations, but there's a step of converting the strings to integers which degrades performance at scale.

MYSQL supports the FULLTEXT index type which indexes the entire string column. However, the larger the string is, the less performant the search will be unless you're searching for the exact, entire string (which rarely happens and defeats the purpose of a string index in the first place)