Damodar Lohani for Appwrite

Posted on Jan 12, 2022

Working with database indexes ( the right way )

#programming #discuss #database #performance

👀 Overview

The database is an integral part of any modern application. We use a database to provide pre-filled information to our users or sync information supplied by users across their devices or their shared community and a failure can cause massive damage to our application and its users. That means, all modern applications must have well structured, scalable, and maintained database to provide optimum service to our users. A database index is an essential feature of any database system, which helps make the database more performant to access data according to the needs of our application. Using database indexes effectively can make your queries much faster, hence making your application much more performant. Who doesn't love extra seconds of responsiveness in their application?

🎬 Introduction to Database Indexes

Indexes are a data structure technique used to quickly locate and access data in the database. It improves the performance of the database by minimizing the number of disk reads required to find the data.

An index is a small table with only two columns, a key, and a reference. The key is the candidate key of the table that we want to index, and the reference column contains pointers to hold the disk block address where the value for that specific key is stored. An index takes a search key as the input and returns a collection of matching records. Imagine you have to find data in a persons table with the person's name, and suppose you don't have an index. Then the database will have to look at each record to figure out whether or not it is the matching record. Going through every record in a table is called a full table scan, and it takes a long time, based on the number of records in the table. However, if you have indexed the table's name column when you search for a particular name, the database will first query the index and get the pointer to the data, then get the matching data. This way database will have to go through only the index instead of scanning the whole table. An index table might be shorter than the data table itself and is sorted to be able to search efficiently.

If it seems too complex to understand, let's take an example from something simpler. Let's assume that we have a box that contains small pieces of paper with English words written on them, one for every word in the English language, and on the back side has the definition of the word. How would we find the definition of 'simplicity'? The slowest way would be to read each piece of paper, one by one, until we find the definition. Though, we could find it much faster if we had all the words sorted ahead of time.

In other words, we need a dictionary.

Finding definitions of words is much faster with a dictionary in hand since the words are listed in alphabetical order. How would we use a dictionary to speed up our search? We would open to a page, look at the first word, and check if we need to flip ahead or back in the book. We repeat this until we find the right page, and eventually, the definition of 'simplicity'. So how would we instruct a computer to do this sort of search? We'll need ask everyone's favorite Computer Science subject for the answer: data structures and algorithms, but only in a simplified way 😉

Our dictionary search is faster because we can quickly eliminate the need to check large chunks of pages - if we flip to the middle-most page in the dictionary and check the first word on the page, we eliminate the need to check half of the pages of the entire dictionary. The data structure representation of this concept is the binary search tree, or B-tree. This data structure optimizes our search from scaling linearly (O(n)) to scaling logarithmically (O(log(n))), which results in massive performance gains for big data sets.

The downside to indexing is that indexes require additional space on the disk. Also, these indexes are kept in sorted order to access the corresponding data quickly, and one can perform a binary search on the index itself. It's important to keep the index balanced along with the constantly updating data. This means that with every write operation (insert, update and delete), the index will also have to be updated, decreasing the write performance.

Creating an index requires additional disk space and also decreases the write performance. So too many indexes can cause issues arising from file systems size limits, degraded write performances, and more. We should be careful and plan what fields we want to index. A poorly indexed database is as bad or worse than a database that is not indexed.

🧑‍💻 Types of Database Indexes

Indexing in Database is defined based on the attributes it's indexing. There are four main types of indexing methods that are:

Primary Indexes

A database table must have some way to uniquely identify each record, so it can read and write data from the correct block of storage. A primary index serves this purpose: by requiring a unique ID for each record, the database can prepare a pre-sorted list (a B-tree, usually) of IDs that point to their locations on disk, vastly improving the speed to find individual records.

Unique Indexes

Primary indexes require a unique identifier that it maps to the record's disk location (for good reason), but we can use B-trees to enforce any attribute to be unique.

Secondary Indexes

Databases can use B-trees to prepare similar pre-sorted lists for any attribute of data that's commonly queried. If we commonly check our Inventory for low-stock items (count < 5), we can increase query performance by creating a secondary index for the count field. We can do the same across multiple attributes (count < 5 && price > 100), but the order and sorted direction increasingly matters as you add more attributes.

Fulltext Indexes

If we want to search for specific text, databases support fulltext indexes to create data structures to support this. The implementation of fulltext indexes can vary between different database systems, but the principle allows us to check our Inventory for items with name LIKE apple.

It's common that where clause use indexes. However, B-tree indexes can also help the order by, group by, select and other clauses.

🔖 Introducing Indexes in the Appwrite Database

Appwrite is an open-source back-end server for all your apps, be it Flutter, Web, Android, or iOS. It helps developers innovate by taking care of modern application development's complex and repetitive parts. Appwrite provides authentication, database, storage, cloud functions, realtime APIs, webhooks, and more. With version 0.12 of Appwrite, we are pleased to announce the newest refactored database service with support for powerful queries and indexes.

Appwrite 0.12 comes with a refactored database service. We bring a database service that is much on par with the DBMS service we implement internally with this version. We get features like indexes, advanced queries, and more to our database. The basic attributes of our database are on par with the underlying database. On top of that, we have added advanced attributes like URL, email, and more that allows you to design complex data structure for your application with proper validation without much work.

Another significant feature introduced with the database in this version are indexes. We already saw above how important indexes are for performance, so we bring you the power to manage your indexes with the Appwrite database to make your queries even more performant. We also made a design decision with Appwrite's new database. With this version, we only allow querying on the fields that have indexes. If you have previously used databases like MariaDB or MySQL, they will enable you to perform queries even when you have not created any index which can lead to full table scans. This is a huge performance hit and can cause database bottlenecks. That is why with Appwrite, we prevent such queries.

🗂 Indexes Supported by Appwrite

Appwrite as of writing of this article supports unique, key and fulltext indexes. Let's look at each of these indexes, what they are and when to create these.

Unique Index

A unique index must use unique values. However, values can be null. Appwrite uses MariaDB's unique index internally, so you get all the benefits of MariaDB's unique index. Each key in the unique index identifies only one record. However, each record in the table may not be represented because of the possibility of the null value. That means you can create a unique key index for unique attributes. For example, suppose you have a user profile collection where you store users' email. In that case, you can create a unique index for email attribute both to prevent duplication and easily find records using email. You can also create a unique index for a combination of attributes. For example, if you support multiple user profiles with different names, you can create a unique index for a combination of email and name attribute.

Key Index

The key index doesn't have to be unique. You might want to add an index to attributes that you want to filter and order your records during accessing. Appwrite uses MariaDB's plain indexes internally, so you get all the benefits of MariaDB's key index. For example, if you are ordering the list of users in the decreasing order of their joining date, you can create a key index for dateJoined in descending order. This will improve performance, as we already discussed above.

Fulltext Index

Fulltext indexes allow searching a portion of text in an attribute. Appwrite uses MariaDB's Fulltext index internally, so you get all the benefits of a full-text index from MariaDB. For example, in your users table, if you have a short biography attribute where users store their short bio, and your application allows searching for users based on words mentioned in their bio, then it's a good idea to create a full-text index for the biography attribute. If you want to pre-populate large data sets in your collection, it's best to first populate the collection with data and then create the required indexes.

Now that we know what indexes are supported by Appwrite let's look at how we can manage indexes in Appwrite. There are two ways you can achieve this, first is using Appwrite's console, and the second is using Appwrite's server-side APIs and SDKs. We will look at both ways. Before moving on to managing indexes, if you have not already installed Appwrite, visit our installation docs.

🕵️‍♂️ Managing Indexes in the Appwrite Database

We have already learned the importance of the indexes above. And same implies to Appwrite's database service as well. Having proper indexes affects a lot in performance. You can use this benchmarking to reference how performance is impacted by having and not having indexes.

Create Collection and Attributes

Before we can create an index, we first need to create a collection and few attributes. So we will create a courses collection that will have the following attributes.

+--------------+--------+------+----------+-------+---------+
| Attribute Id | Type   | Size | Required | Array | Default |
+--------------+--------+------+----------+-------+---------+
| title        | string | 255  | true     | false | null    |
+--------------+--------+------+----------+-------+---------+
| slug         | string | 255  | true     | false | null    |
+--------------+--------+------+----------+-------+---------+
| description  | string | 1000 | false    | false | null    |
+--------------+--------+------+----------+-------+---------+
| authors      | string | 60   | false    | true  | null    |
+--------------+--------+------+----------+-------+---------+
| category     | string | 60   | false    | false | null    |
+--------------+--------+------+----------+-------+---------+
| credit       | float  | -    | true     | false | null    |
+--------------+--------+------+----------+-------+---------+

To create a new collection, log in to the Appwrite console, create a project, or access an existing one. Then access the database by clicking on the Database from left side menu. Once there, you can click on the Add Collection button. You will see a dialog to choose a custom id if you wish to by clicking on the edit icon next to the auto-generated field. Now enter the collection id as courses and name as Courses. Finally, click on the Create button, and you should be directed to a new page with collection details. Access the Attributes tab, and to create attributes, carry out the following steps.

Click on the Add Attribute button and select New String Attribute

In the dialog box you receive, set title as the attribute id.
Size, make it 255
Make it required and
Click on the Create button.

This should add the title attribute. Repeat steps 1 to 6 for each attribute choosing the right option to create attributes defined above.

Create Indexes

Now that we have a collection and attributes, we can plan our index. First, we would like to perform a text search on the title, so we should create a full-text index for the title. Then we want the slug to be unique, so we should create a unique index for the slug attribute. Finally, we might want to filter by author, category, and credit, so we create the key index for each attribute. Now that we have a plan for indexing, let's create those indexes. First, access the indexes tab. Once there, follow the given steps to add an index.

Click on the Add Index button to access the dialog

Now, type title_fulltext in the Index Key field
Select Fulltext as the type and
Under attributes, just select title and leave the ordering as ASC

This should add a Fulltext index for the title attribute. The newly added index should be listed in the indexes tab. Now repeat steps 1 to 4 for every index we planned above, choosing the appropriate options to create all the required indexes.

You can also use Appwrite's server-side SDKs to create these indexes. If you have never used server SDK, please go through the getting started with server docs. You can use the SDK of your choice and use createIndex function as described in docs from Appwrite's database service to add a new index. There are also methods to create collection, create attributes, indexes and update, delete them. This way, you can write and save a script that can add collection, create attributes and create indexes that you can run on any Appwrite instance to set up your collections.

Deleting Indexes

If you want to delete or remove an index that you are no longer using in your applications, you can do it easily from the indexes tab. While there, you can see the list of available indexes in a table with a delete button for each row. Click the delete button on the row of the index you want to delete, and in the confirmation dialog that appears, click ok. If you accidentally delete an index, you can easily recreate it. However, it might take much time if the database already has tons of data. You can also use deleteIndex function from the database service using the server-side SDKs.

Now that we know how to manage indexes with Appwrite database, let's look at how to choose indexes correctly.

🧭 Choosing Indexes

You should carefully plan what indexes to add based on the queries you make from your applications. Any extra indexes will waste resources as well as hamper performance. The index may not make much difference in a small collection, but as soon as your collection grows, indexes will start to speed queries dramatically. If you want to perform a search query, you should create a full-text index. With Appwrite, we also notify you when you try to query an un-indexed field, so you know what to index. If you created an index but have not used it, remove those to increase create and update queries performance.

👋 Conclusion

Database indexes are a critical aspect of database design for your application, and with Appwrite, we make it easy to manage your indexes. Properly planned indexes will improve performance by a lot and prevent database failure. We hope this article was a helpful guide to database indexing while designing a data structure for your next application and will help you build robust and performant apps. If you have any queries ( pun intended! ) or confusion, please reach us on our Discord server.