DEV Community

Guilherme Rodrigues
Guilherme Rodrigues

Posted on

Supercharge Read Performance: 397x Faster Database Queries Through Denormalization

When working with relational databases, we often encounter situations where we need to count related records in a table.

First, let's consider the following tables:

CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY, 
    order_date DATE,
    customer VARCHAR(100)
);

CREATE TABLE order_items (
    item_id SERIAL PRIMARY KEY,
    order_id INT,
    product VARCHAR(100),
    quantity INT,
    FOREIGN KEY (order_id) REFERENCES orders(order_id)
);
Enter fullscreen mode Exit fullscreen mode

We have an orders table and want to count how many items each order contains. A common approach would be to use a COUNT subquery:

SELECT 
    o.order_id,(SELECT 
          COUNT(*) FROM order_items oi 
     WHERE oi.order_id = o.order_id
) AS total_items
FROM 
    orders o;
Enter fullscreen mode Exit fullscreen mode

This query works, but can become inefficient in scenarios with large data volumes as the subquery executes for each row in the orders table.

Enter Denormalization!

Denormalization in databases is the process of introducing redundancy to a previously normalized database, aiming to improve performance for read operations (queries). It involves adding duplicated data or combining tables to reduce the number of required operations (joins, counts...), which can speed up information retrieval in systems where reads are more frequent than writes.

Now that we understand denormalization, how can we simplify our query?

Instead of counting items every time we execute the query, we can add a total_items field directly to the orders table. Whenever an item is added or removed, we increment or decrement this field.

  1. Add the total_items column to the orders table:
ALTER TABLE orders ADD COLUMN total_items INT DEFAULT 0;
Enter fullscreen mode Exit fullscreen mode
  1. Update the total_items field whenever items are added/removed
  2. Your query becomes simpler and more efficient:
SELECT 
    order_id,
    total_items
FROM 
    orders;
Enter fullscreen mode Exit fullscreen mode

Let's insert sample data to validate our proposal using this function:

CREATE OR REPLACE FUNCTION generate_sample_data()
RETURNS VOID AS $$
DECLARE
    order_count INT := 200; 
    items_per_order INT := 10;
    current_order_id INT;
    random_customer VARCHAR(100);
    random_product VARCHAR(100);
    random_quantity INT;
BEGIN
    FOR i IN 1..order_count LOOP
        random_customer := 'Customer ' || (floor(random() * 1000)::INT);
        INSERT INTO orders (order_date, customer, total_items)
        VALUES (current_date - (floor(random() * 365)::INT), random_customer, 10)
        RETURNING order_id INTO current_order_id;

        FOR j IN 1..items_per_order LOOP
            random_product := 'Product ' || (floor(random() * 100)::INT);

            random_quantity := (floor(random() * 10)::INT + 1;

            INSERT INTO order_items (order_id, product, quantity)
            VALUES (current_order_id, random_product, random_quantity);
        END LOOP;
    END LOOP;
END;
$$ LANGUAGE plpgsql;

SELECT generate_sample_data();
Enter fullscreen mode Exit fullscreen mode

This script creates 200 records in orders and 10 corresponding records in order_items for each order.

To compare performance, we'll execute both queries with EXPLAIN ANALYZE:

Using subquery:

Query whit count
Execution time: 28.983ms

Using denormalized field:

query with denormalized field

Execution time: 0.073ms (397x faster). Note this test used only 200 order records - performance gains become more significant as data volume increases.

Advantages:

Performance: Avoids repetitive subquery execution
Simplicity: Cleaner main query structure
Control: Direct management of counter values

Considerations:

Consistency: Must ensure total_items always reflects accurate counts
Concurrency: Requires atomic updates in high-concurrency systems

Hope you found this content useful!

AWS Q Developer image

Your AI Code Assistant

Automate your code reviews. Catch bugs before your coworkers. Fix security issues in your code. Built to handle large projects, Amazon Q Developer works alongside you from idea to production code.

Get started free in your IDE

Top comments (0)

Retry later
👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay