DB 101-Vocabulary

Before an SQL statement is executed, RDBMS query optimizer decide
the data access path
Which index should be used based on index slice

Random Read
Read won’t happen if an index or table page is found in the disk cache/DB buffer pool.
Page size: 4K, disk drive speed: 40MB/s
Random read time:

Skip Sequential Read
The DBMS would then first collect the pointers from all qualifying index rows and then sort the pointers by table page number. Now the DBMS has a sorted list of all the table pages that contain at least one qualifying row; only now will the table be accessed.

Index Slice
Index Slice defines the number of matching columns used during query processing. The cost of the access path is clearly going to depend on the thickness of the slice which in turn depends on the RANGE predicate. A thicker slice will require more index pages to be read sequentially by increasing the number of synchronous reads (10ms per page).

Filter Factor

The scanned slice of the data by the optimizer depends on the filter factor of the data of the predicates.

Here is an example of strong correlation. If the largest filter factors for CITY = :CITY and LNAME = :LNAME are 10% and 1%, respectively, the largest filter factor for CITY = :CITY AND LNAME = :LNAME is indeed 0.1 × 0.01 = 0.001 = 0.1% because it relates to the LNAME distribution of a specific CITY. Though both the columns have no correlation, when designing an index, filter factor for a compound predicate should be considered as a whole.

The filter factors for predicates MAKE = :MAKE and MODEL = :MODEL may be 1/100 and 1/1000, respectively. The filter factor for MAKE = :MAKE AND MODEL = :MODEL is much
less than 1/100,000, perhaps as low as 1/1000.
For compound predicate, DB scanned slice i.e. total number of index rows should be multiplication of each field filter factor even though they are not correlated. To measure worst case scanned rows, developers need to measure each filter factor.

Index columns should be ordered according to descending cardinality (the number of distinct values).

Three star Index

Consists of all columns given for a predicate needs one random disk read with sequential page read enabling index only access path. DBMS needs to access only the index tree.

If there are 1000 result rows, the filter factor for the compound predicate LNAME = :LNAME AND CITY = :CITY is 0.1%. The scanned index slice now consists of only 1000 rows because there are two matching predicates: columns LNAME and CITY define the index slice. Fat index eliminates table access (e.g. by ROWID). In this example, three star index consists should be:
( LNAME , CITY , FNAME , CNO ) or ( CITY , LNAME , FNAME , CNO ).
Order of the columns has no impact on the SELECT performance but the cost of UPDATE statements should be reduced by placing less filter factored columns at the end of the index.

Read time: 1 * 10 ms (random read) + 1000* 0.1 (sequential read) = 0.1s

Fat index is useful when predicates do not contain RANGE predicates.

Two Star Index
During read, If SORT operation is not required then the index is two-star / semi fat. For the above query, if the index is (CNO, FNAME, LNAME) then a SORT operation is required. But, if the index is (FNAME, LNAME, CNO) then no SORT is required.

One Star Index
During reading, for an index to be one star, EQUAL(and RANGE) predicate columns are indexed, then sliced becomes thinner, only those rows are touched but not in any specific order.

QUICK UPPER BOUND ESTIMATE(QUBE)

Local Response Time (LRT) : Elapsed time within the DBMS server.
Service Time : sum of CPU time and random read time. If there is no disk queueing time, then LRT should be equal to service time.
Queuing time: Locking time for the other tasks.

Access Types
Matching columns
Screening columns
Sort avoidance columns

Best Index for EQUAL Predicate

Time taken:

Comparison with different kinds of index:

I/O Estimate

A full table scan with a singleton SELECT that rejects all rows (1 × 10 ms + 1,000,000 × 0.01 ms = 10 s)
An index scan with 1000 FETCH calls, using an index such as (LNAME, FNAME), that causes an index slice scan of 1000 rows and 1000 synchronous reads to the table (1000 × 0.01 ms + 1000 × 10 ms = 10 ms + 10 s = 10 s)

DEV Community

DB 101-Vocabulary

Top comments (0)

Read next

Partitioning Data with a Dictionary in C#

Behind the Scenes at Linkspreed: A Global Team Shaping the Future of Web4

My thoughts on the Url::defaults method

We have built a Tailwind CSS grid generator.