DEV Community

Mustafa Balila
Mustafa Balila

Posted on

Database indexing and selectivity

As you may already know, indexing is a great way to improve the performance of your queries. But is that really it?

There are no silver bullets, only trade-offs. What's the catch? There's always a catch!

When you create an index, the RDBMS creates a copy of your table containing one or more columns depending on how many columns you have indexed.

This copy will always be sorted, and it has to be maintained while inserting, updating, and deleting rows. That's why indexes slow down writes and lock tables during index creation.

Enough about the RDBMS job, let's talk about yours. Which columns should you index?
It depends!. Your queries should drive your indexes, but you can use column selectivity to decide if a column will make a good index or not.

Selectivity describes the uniqueness of values in a column.

A highly selective index is good because it filters out more rows in a search.

Let's imagine a table with columns for email, country, and gender.

  • The gender column is the least selective, since it can only - hold two values.
  • The country column is more selective since it can hold more values. The email column is the most selective because it's unique.

Selectivity affects search performance because it divides the table into slices and uses these slices for searching. Selective columns form smaller slices, which makes searching them faster.

How to find the selectivity of a column?
simply run this query:

select count(distinct column)/count(*) from table;
Enter fullscreen mode Exit fullscreen mode

In postgres, you'll need to cast values to numeric for this to work

Top comments (0)