DEV Community: Antônio Inocêncio

Start monitoring the cause, not the consequences.

Antônio Inocêncio — Thu, 16 Nov 2023 13:01:44 +0000

How a few queries were compromising the whole operation of a business.

This situation happened to one of our customers and it shows how important it is to know exactly what is running in your database servers and to keep track of that.

“MeusPedidos is a SaaS application that handle the processing of thousand of orders per second. Those Orders made by Industries, Sales Representative and Distributors can be done by users in their Web App, users of Mobile Apps, third party integrations of a public API and finally by some huge XLS Sheets imported directly in their Web App.”

Overview

Initially, the RDS was running on a “m3.2xlarge” instance. After executing some tests, MeusPedidos’ team found some limitations in the “m3.2xlarge” instance with Multi-AZ (you can see more info about that here). They got in touch with the AWS Support and were told to upgrade to a “m4.2xlarge”.

After the migration, there was a reduction in the frequency and intensity with which instabilities occurred in the system but it was still possible to identify Write Throughput and High DiskQueueDepth spikes with CloudWatch’s monitoring. As you can see in the picture below:

Monitoring with Nazar

Then we started monitoring their application with Nazar to identify the offensive queries responsible for the spikes. After that we could then identify some infrequent queries with a high execution time.

By crossing the time these queries were run and the peak times presented in CloudWatch, we identified the relationship between them.

Cause

When these queries were executed, a relatively large amount of data was manipulated, which generated the need for disk write operations specially for the “Creating sort index” and “Sending data” stages in the execution plan.

Example:

Duration 3.58268150
Creating sort index 2.843240
Sending data 0.721862

Solution

With Nazar it was very easy to identify the 3 queries that were presenting this characteristics and they were optimized by taking the following actions:

Index creation;
SQL rewritten;
Reducing the number of columns in the SELECT list;

Results

After the implementation (04/18/2018) of the optimizations in the 3 queries identified, there was no more Write Throughput and High DiskQueueDepth spikes.

Conclusion

The secret to anticipate that daily performance problems will scale and become critical issues is to observe them continuously and to monitor the cause, not the consequences. Even after upgrading to a higher instance the queries continued to be executed and to compromise the application’s performance.

"The partnership with Nazar was really helpful to tackle down write throughput problems that were haunting us for a very long time." - Israel Fonseca — Senior Software Developer

Partial index

Antônio Inocêncio — Tue, 14 Nov 2023 16:22:02 +0000

A partial index is a valuable feature supported by various relational databases but is not commonly used. In this post, we will explore some use cases where you can benefit from using partial indexes.

Partial indexes can be useful in various scenarios where you want to improve query performance by reducing the size of an index or filtering which rows are included in the index. Here are some examples of when to use partial indexes:

Selective Indexing: When you have a large table with many rows but want to index only a subset of those rows that meet specific criteria. For example, you may have a table of customer orders, and you want to create an index only for orders that are not canceled (status != ‘canceled’).
Data Range Filtering: When you have a table with a date or timestamp column and want to create an index for a specific time range. This can be helpful for time-based data, like logs or sensor readings.
Boolean or Enum Columns: When you have boolean or enum columns and want to create an index on specific values. For instance, you might have a table of products and want to index only the active products.
Sparse Data: When you have a column with a low cardinality and want to create an index for a specific value that is not present in most rows.
Query Optimization: When you have queries that frequently filter by a specific condition, a partial index can significantly speed up those queries. For example, if you often search for unshipped orders.
Storage Optimization: Partial indexes can reduce the storage requirements of your database because they are smaller due to indexing a subset of rows. This can be especially valuable in scenarios with limited storage capacity.
Concurrency: In databases with high levels of concurrent write operations, partial indexes can reduce contention and locking conflicts by only indexing a subset of rows.

Let’s consider a scenario where you have a table for user accounts, and it includes a boolean column called active to track whether a user account is active (true) or inactive (false). In this case, most of the user accounts are inactive. We want to compare the use of a partial index and a non-partial index for this scenario.

Non-Partial Index Example:

Suppose you have a user_accounts table with 8,4 million rows and the following schema:

CREATE TABLE tb_user_accounts (
    user_id serial PRIMARY KEY,
    username varchar(255) NOT NULL,
    active boolean NOT NULL
);

To create a non-partial index on the active column, you would typically create an index that covers all rows, including both active and inactive users:

CREATE INDEX idx_all_users_active ON tb_user_accounts (active);

This index will include all user accounts, regardless of their active status.

Partial Index Example:

Now, let’s consider the scenario where you want to optimize queries that involve active user accounts (where active = true). To do this, you can create a partial index as follows:

CREATE INDEX idx_active_users_active ON tb_user_accounts (user_id) 
WHERE active = true;

In this case, the WHERE clause specifies that only rows where active is true will be included in the index. This results in a smaller, more focused index that includes only active user accounts, leading to better query performance.

select pg_size_pretty(pg_table_size('idx_all_users_active'));

pg_size_pretty|
--------------+
56 MB         |

select pg_size_pretty(pg_table_size('idx_active_users_active'));

pg_size_pretty|
--------------+
240 kB        |

The above syntax is compatible with PostgreSQL. In Oracle Database and MySQL 8, you can create an function based index using a CASE expression to effectively achieve a partial index-like behavior. Here's an example:

CREATE INDEX idx_active_users_active ON user_accounts 
(CASE WHEN active = 1 THEN user_id ELSE NULL END);

In summary, a partial index is advantageous when you have specific criteria for indexing a subset of rows within a table, such as optimizing queries for active user accounts in a scenario where most rows have the value active = false. It can lead to improved query performance, reduced storage requirements, and better concurrency handling compared to non-partial indexes, which index all rows indiscriminately.

Written by Matheus Mendonça, DBA & co-founder at Nazar

[SQL Performance Killers] Individual inserts vs. Bulk inserts

Antônio Inocêncio — Thu, 09 Nov 2023 18:52:06 +0000

This post was written by my co-founder Matheus

An important part of my daily work is to help developers, database administrators, devops, sysadmins and project managers to identify (using Nazar.ai) and fix bad SQL code. And the idea with the [SQL Performance Killers] series is to share some practical sql tuning examples.

Continuing (after a very long time) the [SQL performance killers series] in this post I’ll explain why bulk insert operations are generally faster than many individual insert operations.

Bulk inserts are significantly faster than individual inserts when working with a database for several reasons:

Reduced network traffic: Bulk inserts reduce the amount of data transferred between the application and the database server. In many cases, network latency can be a bottleneck for database performance. By sending a single batch of data, you can reduce the impact of network latency and improve efficiency.

Reduced transaction overhead: Each individual insert operation is typically wrapped in a transaction, which can lead to increased overhead due to transaction management. Bulk inserts can be enclosed in a single transaction, reducing the overhead associated with transaction management and ensuring data consistency.

Locking and concurrency: When you perform many individual inserts, each insert may require locks on the affected rows, leading to potential contention and concurrency issues in a multi-user environment. Bulk inserts often lock the entire table or a specific set of rows, reducing contention and improving concurrency.

Logging and indexing: Databases often maintain transaction logs and indexes to ensure data consistency and query performance. Bulk inserts are more efficient in terms of logging and indexing because they involve fewer transactions and updates to indexes.

In the example below I inserted 40,000 rows in a sample table, first using individual inserts and then using bulk insert.

INSERT INTO TB_INSERT(NAME) VALUES ('AF9CB08DF4F7B71F033CC857ECF30C21');
INSERT INTO TB_INSERT(NAME) VALUES ('B16D3C99C04F223E362BC0E1B4FFE7CD');
...
INSERT INTO TB_INSERT(NAME) VALUES ('BEDB35BC448FD5F32F37B86BECFDF225');
INSERT INTO TB_INSERT(NAME) VALUES ('4436A24C954EA17AEE9E92D4F16FAD20');

UPDATED ROWS 40000
START TIME MON NOV 06 14:52:02 PST 2023
FINISH TIME MON NOV 06 14:52:22 PST 2023

INSERT INTO TB_INSERT(NAME) VALUES
('AF9CB08DF4F7B71F033CC857ECF30C21'),
('B16D3C99C04F223E362BC0E1B4FFE7CD'),
...
('BEDB35BC448FD5F32F37B86BECFDF225'),
('4436A24C954EA17AEE9E92D4F16FAD20');


UPDATED ROWS 40000
START TIME MON NOV 06 14:56:32 PST 2023
FINISH TIME MON NOV 06 14:56:34 PST 2023

As we can see, inserting 40,000 rows into a sample table using bulk load was 10 times faster than using individual inserts. While individual inserts took 20 seconds to complete, bulk inserts were done in only 2 seconds.

However, it's essential to note that the suitability of bulk inserts depends on the database management system (DBMS) you are using and the specific use case. Always consider the characteristics of your database and the requirements of your application when deciding whether to use bulk inserts or individual inserts.

"It is not lack of hardware,
It is not network traffic,
It is not slow front ends,
the main performance problem in the huge majority of database applications is bad SQL code."
Joe Celko