Indexing in a database
Indexing, maybe that is a term you already heard when building or using a database.
Put indexes, else your request will take too much time.
Alright, that works, my primary key is set. My requests are faster when I join my primary key with another table column. But why?
Publisher
id name age
=====================
1 John 38
2 Elizabeth 24
3 Mike 17
We have publishers, and they post some articles on a web app.
Article
id title createdAt publisherId
===================================================================
1 Unit tests for your next library 2019-01-26 2
2 WebP: when performance matter 2019-02-25 2
3 Indexes: why would I need it 2019-03-09 3
So you will join them to get the posts for each publishers:
SELECT
publisher.name,
article.title,
article.createdAt
FROM
publisher
JOIN
article on article.publisherId = publisher.id
With the expected result:
name title createdAt
Elizabeth Unit tests for your next library 2019-01-26
ELizabeth WebP: when performance matter 2019-02-25
Mike Indexes: why would I need it 2019-03-09
Up to this, nothing wrong. Now, imagine we have 50 publishers and 1000 articles.
In order for the database to figure out to which publisher belongs to which articles, it takes your article publisherId
, and then, for each of these, will loop for each publisher until it founds the publisher that matches the id, and pulls out the desired data.
article.publisherId publisher.id
===================================
59 1 ? No
59 2 ? No
59 3 ? No
...
59 59 ? Yes -> pulls out related data
32 1 ? No
32 2 ? No
32 3 ? No
...
32 32 ? Yes -> pulls out related data
11 1 ? No
11 2 ? No
11 3 ? No
...
11 11 ? Yes -> pulls out related data
...
In the worst case, let us imagine only the publisher #50 posts the 1000 articles. This means for each posts, n posts, we will loop for each publishers, n publishers, to find the relation. Which is a complexity of n x n. We imagine the request takes 0.5 milliseconds (500 µs) for each round.
(0.5 ms * 1000 articles) * (0.5 ms * 50 publishers)
500 ms * 25 ms
12 500 ms
12.5 sec
Current query complexity: O(n²)
To fix this, database engines supports primary keys indexing to help the engine figure out quickly to which records belongs which identifier. So instead of storing your records in the form of an array, the database engine will index your rows, in this manner (schematized using JSON notation, not how the database truely stores the data):
Publisher
{
1: {
"name:": "John",
"age": 38
},
2: {
"name": "Elizabeth",
"age": 24
},
3: {
"name": "Mike",
"age": 17
}
}
Which make the database able to "attack" directly the indexed id instead of looping through each records until he finds the correct key!
So if you want the publisher name with id 50, you can do it in 1 round:
const name = Publisher[50].name;
Back with our 1000 articles made by 50 publishers, we can now directly get the publisher (no need to loop for each of them), so only 1000 rounds. Our database still takes 0.5 milliseconds per round (500 µs).
(0.5 ms * 1000 articles) * (1 ms * 1 publisher)
500 ms * 1 ms
500 ms
0.5 sec
Current query complexity: O(n)
We have gone from 12 sec to nearly a half second. Not that bad!
Indexing an array
Indexing arrays is the exact same process. Imagine we are pulling routes (urls) and their associated content, like in a CMS.
use App\Article;
$articles = Article::all();
print_r($articles);
Resulting in the console:
[
[
id" => 1,
"url" => "/advantages-of-indexing-arrays",
"content" => "..."
],
[
id" => 2,
"url" => "/webp-is-the-new-fast",
"content" =>
"..."
],
[
id" => 3,
"url" => "/vue-js-async-components",
"content" => "..."
],
...
]
Alright, now the user is requesting the route "/7-tips-to-be-more-productive-on-laravel-5", and your database contains 1000 articles.
use App\Article;
use App\Request;
use App\Response;
class ArticleController {
// Shows the desired article content to the client
public function show(Request $request): Response {
$articles = Article::all();
$url = $request->url();
foreach ($articles as $article) {
if ($article->url === $url) {
// found it, pull content and display the view
break;
}
}
}
}
In that configuration, your controller will loop for each articles, and when he finds the correct one, displays the content. In the worst scenario, 1000 articles, the user request the last article, 1000 loop rounds.
With 1 ms per round, this gives us:
1000 articles * 1 ms
1000 ms
1 sec
Current controller complexity: O(n)
O(n), which means n articles (we do not know how many article we have in advance).
1 sec to send the result to the client browser, potentially more time for the browser to render the page. We can do better by indexing this array.
use App\Article;
use App\Request;
use App\Response;
class ArticleController {
// Shows the desired article content to the client
public function show(Request $request): Response {
$articles = $this->articles();
$url = $request->url();
if (isset($articles[$url])) {
// found it in one try!
}
}
// The indexing happens here
protected function articles(): array {
$articles = Article::all();
$indexedArticles = [];
foreach ($articles as $article) {
$indexArticles[$article->url] = $article;
}
return $indexArticles;
}
}
Now we can directly "attack" the array, and get the article data. isset()
returns true if the key is in the array, else returns false. It also directly target the key, so it does not loop the entire array.
1 route * 1 ms
1 ms
Current controller complexity: O(1)
O(1), because no matter how many articles we have in database, it will be either found or not instantly.
Cautions
This example is partially correct, because I do not take into account the complexity of fetching articles (maybe my Article class performs additional process, for each rows, so potentially more seconds spent by the back-end).
Also, you definitively not want to index your array at each server requests. You probably will cache it using Redis or in a file (in a form of a JSON for example). Because in the end, the indexing is a process that requires to loop through your array anyway.
Conclusion
I hope this article gave you more insights around indexes and when using them will make our requests quicker.
This technique is very useful when your first loop needs to loop again and find an element (like a database would need to make a join).
If you need to join on several criteria (e.g. column value), this is the same process.
{
"GET": {
"/": { "controller": "HomeController@index" },
"/newsletter": { "controller": "NewsletterController@index" }
},
"POST" => {
"/newsletter": { "controller": "NewsletterController@store" }
}
}
You can attack this object by many criteria (in this case, 2 criterias, the url and the protocol):
const routes = []; // filled by the array above
const protocol = "POST";
const url = "/newsletter";
const controller = routes[protocol][url].controller;
Happy optimizations!
Photo by rawpixel.com from Pexels
Top comments (0)