DEV Community: Divyansh Gupta

Optimizing Materialized View Refresh to Minimize Locks in PostgreSQL

Divyansh Gupta — Mon, 30 Jun 2025 06:58:28 +0000

Optimizing Materialized View Refresh to Minimize Locks in PostgreSQL
This article explores an enhancement to a dual‑DB PostgreSQL setup that dramatically reduces lock contention by selectively using concurrent refresh on high‑dependency materialized views. We’ve also added architecture diagrams to clarify data flows and lock behavior.

1. Architectural Overview

Trans DB: Runs non‑concurrent refreshes on most MVs, leveraging local raw data where compute is fastest.
Cache DB: Defines FDW-based foreign tables on Trans DB’s MVs and refreshes its own MVs concurrently, ensuring UI queries never block.

2. Problem: Lock Explosion on High‑Dependency MV

When mvw_test (which depends on ~90% of other MVs) refreshed non‑concurrently, PostgreSQL acquires exclusive locks on each base relation. This causes:

Consequences:

Widespread Contention: All other MVs and queries stall.
Connection Saturation: Waiting sessions accumulate, exhausting slots.
Memory Spikes: Queued locks consume RAM → OOM errors.
Service Restarts: Cache DBOOM crashes, interrupting UI.

3. Solution: Selective Concurrent Refresh

Only the high‑impact MV uses CONCURRENTLY, dramatically reducing locking scope:

    -- Step 1: Ensure unique index
    CREATE UNIQUE INDEX CONCURRENTLY idx_test_pk
      ON mvw_test (key_column);

    -- Step 2: Refresh concurrently
    REFRESH MATERIALIZED VIEW CONCURRENTLY mvw_test;

Why Concurrent?

Lightweight Locks: Shared locks only on the MV itself, not on its dependencies.
No Blocking: Other MVs and queries continue normally.

Scheduling:

Run at off‑peak, e.g., 3 AM daily via cron:

        0 3 * * * psql -c "REFRESH MATERIALIZED VIEW CONCURRENTLY mvw_test;"

Monitor with:

        SELECT pid, relation::regclass, mode, granted
          FROM pg_locks
          WHERE relation::regclass = 'mvw_test';

4. Before vs. After: Lock Footprint

5. Benefits

Reduced Contention: Dependencies remain unlocked.
Stable Connections: Fewer waiters preserve slots.
Better Memory Profile: No queue‑induced OOMs.
Higher Availability: Cache DB stays up, UI never blocks.
Focused Overhead: Only one MV pays the cost of concurrency.

6. Recommendations & Extensions

Index Health: Periodically REINDEX CONCURRENTLY the unique index.
Expand to Others: Identify other MVs with >50% dependencies for similar treatment.
Advanced: Consider incremental MVs (REFRESH MATERIALIZED VIEW ... WITH DATA) or logical replicas for ultra‑low‑lock reporting.

Optimizing SQL Queries with Partitioning: The Secret Weapon for Managing Massive Databases

Divyansh Gupta — Thu, 26 Jun 2025 08:05:51 +0000

In the world of data-driven applications, few things can slow down a system more than inefficient database queries. When tables grow too large, even the most well-designed queries can become sluggish, leading to poor performance and frustrated users.
Enter partitioning—one of the most powerful techniques for optimizing large tables in SQL databases. In this post, we’ll dive deep into partitioning strategies, explore best practices for SQL query optimization, and look at a real-world case study of a growing Google Docs metadata table.

The Challenge: Performance Bottleneck in a Growing Database

Imagine managing a table that stores metadata for over 80 million Google Drive files. Each record contains metadata such as file references, author details, creation dates, and more. As the number of records keeps climbing, you notice performance degradation in query executions. Common queries, especially those filtering by userid, now take over 30 seconds to execute. The growing database volume is overwhelming your queries, and traditional optimization methods no longer cut it.

Why Partitioning is the Key to Query Optimization

At the heart of the performance issue lies the fact that a single, massive table is being queried for large amounts of data. Partitioning—the process of dividing a large table into smaller, more manageable pieces—can dramatically improve query performance. It allows the database to operate on these smaller subsets, reducing the time needed to scan and retrieve relevant records.

How Partitioning Works

When we partition a table, we divide it into smaller, logically separate partitions based on a chosen partition key. In our case, partitioning by userid makes sense because it’s a mandatory field in almost all queries, and it directly maps to how users interact with their data. This leads to partition pruning, where only the relevant partitions are scanned for the data that the query needs.

Partitioning Strategy for Google Docs Metadata Table

Let’s go step by step to explore the partitioning strategy we used for optimizing the dbo.googledocs_tbl:

Partition Key Selection: We selected userid as the partition key because it is commonly used in queries and is essential for filtering data specific to individual users.
Hash Partitioning: To ensure uniform data distribution across partitions, we opted for hash partitioning. This technique spreads data across partitions evenly, minimizing the risk of data skew where some partitions might hold more data than others.
Number of Partitions: Based on our analysis and volume of data, we created 74 partitions. This ensures even distribution of user data while providing ample room for future growth.
Targeted Indexing: We designed partition-specific indexes to ensure that search operations for individual partitions remain fast and efficient. For instance, indexes on columns like docfileref, authoremail, and createddate are optimized for the specific partition data.

Before and After: A Tale of Query Performance

Let’s visualize the before and after performance when partitioning is applied to our queries. We'll use a real-world example to see how partitioning improves query execution.
Before Partitioning:

Imagine you are running a query that searches for all files created by user123 between 2024-01-01 and 2024-12-31. The query has to scan millions of rows, filtering based on multiple columns like docfileref, createddate, and userid.

    SELECT * 
    FROM dbo.docs_tbl 
    WHERE userid = 'user123' 
    AND createddate BETWEEN '2024-01-01' AND '2024-12-31';

Here, the query has to scan all the rows in the table (even those that don’t match the userid filter), leading to slow performance and a longer wait time for results.
After Partitioning:

With partitioning, the query now targets only the relevant partition for user123. The query becomes far more efficient, scanning a much smaller dataset.

    SELECT * 
    FROM dbo.docs_tbl_ptn_part_1 
    WHERE userid = 'user123' 
    AND createddate BETWEEN '2024-01-01' AND '2024-12-31';

The table has been partitioned by userid, and now the query only needs to scan the partition containing user123's data, resulting in a dramatic reduction in execution time.

Partitioning in Action: Real-World Examples of Query Optimization

Let's explore some real-world scenarios where partitioning significantly improves query performance.

1. Accessing User Archives

Before partitioning, fetching a user’s archived documents would require scanning the entire table, even though we are only interested in one user’s data. With partitioning, queries can skip irrelevant data and directly access the data for the specific userid.
Query Before Partitioning:

    SELECT * 
    FROM dbo.docs_tbl 
    WHERE userid = 'user123' AND retentionstatus = 0;

Query After Partitioning:

    SELECT * 
    FROM dbo.docs_tbl_ptn_part_1 
    WHERE userid = 'user123' AND retentionstatus = 0;

2. Optimizing Aggregation Queries

Aggregation queries like calculating the count of documents or average file sizes can be slow without partitioning, as they scan the entire table. Partitioning allows us to perform aggregation on individual partitions, making these queries much faster.
Query Before Partitioning:

    SELECT COUNT(*) 
    FROM dbo.docs_tbl 
    WHERE createddate BETWEEN '2024-01-01' AND '2024-12-31';

Query After Partitioning:

    SELECT COUNT(*) 
    FROM dbo.docs_tbl_ptn_part_1 
    WHERE createddate BETWEEN '2024-01-01' AND '2024-12-31';

Best Practices for Indexing Partitioned Tables

While partitioning helps to reduce the data scanned by queries, indexing plays a crucial role in improving query performance within each partition.
Here’s a set of best practices for creating indexes on partitioned tables:

Use Local Indexes: Local indexes are specific to each partition, making them more efficient than global indexes, which span the entire table.
Index Columns Frequently Filtered: Focus on creating indexes for columns that are frequently used in filters, such as userid, docfileref, and createddate.
Optimize Full-Text Search: For text-heavy queries, consider using a GIN index for columns that are often searched via full-text searches, such as title or description.

    CREATE INDEX ix_docs_tbl_title 
    ON dbo.docs_tbl USING gin (lower(title) gin_trgm_ops);

Step-by-Step Migration Plan: From Single Table to Partitioned Table

Migrating to a partitioned table requires careful planning. Below is a streamlined migration plan:

Create Partitioned Parent Table: Create the new partitioned table that mirrors the existing table.
Create Partitions: Create 74 child partitions using a hash function based on userid.
Create Indexes: Set up indexes to optimize search and retrieval on the partitioned table.
Data Migration: Migrate data in batches to avoid locking the entire table. Monitor progress using a migration tracking table (partition_migration_tbl).
Switch Over: After data migration is complete, rename the partitioned table to take over the production role.

Monitoring Migration: Visualizing Progress with Grafana

Real-time monitoring during migration is critical for success. Grafana provides an excellent way to monitor progress, ensuring the migration is on track. By querying the migration tracking table (partition_migration_tbl), Grafana can visualize key metrics like:

Data migration status
Index creation progress
Overall migration completion
Query latency before and after partitioning

Example Grafana Dashboard for Migration:

Bar chart: Visualizes migration progress across partitions.
Line graph: Tracks query performance improvements over time.
Alerting: Set up notifications if migration slows down or encounters errors.

PoC Results: How Partitioning Improves Query Performance

In our Proof of Concept (PoC), we tested partitioning with two datasets: 16 million records and 1.6 billion records. The results were striking:
| Data Volume | Query Time (Without Partitioning) | Query Time (With Partitioning) |
| --- | --- | --- |
| 16 Million | 17 seconds, 30,000 disk reads | 0.1 seconds, 207 disk reads |
| 1.6 Billion | 400 seconds, 550,000 disk reads | 2 seconds, 7,500 disk reads |
As the table shows, partitioning not only dramatically reduces query time but also reduces the disk I/O significantly.

Mermaid Diagram: Migration Process Flow

Here’s a Mermaid diagram to visualize the migration process of partitioning:

This Mermaid diagram illustrates the step-by-step migration process, ensuring that the entire migration flow is smooth and that the database remains consistent throughout.

Conclusion: Embrace Partitioning for Long-Term Scalability

Partitioning is a game-changing strategy for optimizing SQL queries, especially for large datasets. By splitting large tables into smaller, more manageable partitions, you significantly improve query performance, reduce disk I/O, and ensure your database remains scalable as data grows.
For Database Administrators, partitioning offers the opportunity to future-proof their systems while providing a seamless experience for users. Combine partitioning with strategic indexing, real-time monitoring with Grafana, and a careful migration plan, and you’ll have a well-optimized database that can handle even the largest data volumes without breaking a sweat.
If you're dealing with massive data tables, partitioning isn't just a best practice—it’s essential for keeping your database fast, scalable, and user-friendly. Embrace partitioning and watch your query performance soar!

This enhanced blog should now serve as a comprehensive guide, with more creative insights, detailed explanations, and real-world applications of partitioning in SQL. It provides clear examples and practical steps, making it a valuable resource for database administrators looking to improve performance in large-scale databases.

Database optimization best practices

Divyansh Gupta — Wed, 25 Jun 2025 10:48:52 +0000

Imagine your database as a wild animal sanctuary: some queries lumber like tortoises, while others sprint like cheetahs. Your job as a DBA is to coax every query into channeling its inner cheetah—fast, efficient, and resource-savvy. In this KB article, you’ll discover practical techniques, vibrant code examples, ASCII-art execution plans, and Mermaid flowcharts that transform sluggish SQL into scalpels of performance. fileciteturn0file0

1. Measure Twice, Cut Once: EXPLAIN & ANALYZE

Before refactoring, know your enemy. Use:

EXPLAIN (ANALYZE, BUFFERS)
SELECT ...;

to expose hidden bottlenecks: row estimates, buffer hits vs. reads, and CPU vs. I/O costs.

                                     QUERY PLAN
-----------------------------------------------------------------------------------
 Hash Join  (cost=150.00..500.00 rows=1000 width=64) (actual time=12.345..45.678 rows=950 loops=1)
   Hash Cond: (t1.id = t2.foreign_id)
   Buffers: shared hit=2000 read=1500

This tells you whether your query is CPU-bound, I/O-bound, or suffering from bad cardinality estimates.

2. Indexing Mastery: More Than Just B‑Trees

2.1 Partial & Expression Indexes

Target hot filter patterns without bloating:

CREATE INDEX idx_active_users ON users((lower(email)))
 WHERE status = 'active';

2.2 BRIN for Time-Series

Massive append-only tables? Try BRIN:

CREATE TABLE logs (
  ts TIMESTAMPTZ,
  event JSONB
) PARTITION BY RANGE (ts);
CREATE INDEX ON logs USING BRIN (ts);

This lightweight index slashes size at scale.

3. Encapsulate Complexity: Stored Functions & Views

Rather than embedding 10 JOINs in every API call, wrap logic in a function or view:

CREATE OR REPLACE FUNCTION daily_sales_summary(day DATE)
RETURNS TABLE(user_id UUID, total DECIMAL) AS $$
BEGIN
  RETURN QUERY
  SELECT s.user_id, SUM(amount)
  FROM sales s
  WHERE date_trunc('day', s.ts) = day
  GROUP BY s.user_id;
END;
$$ LANGUAGE plpgsql;

The planner can optimize a stable function more aggressively than ad-hoc SQL.

4. Aggregations & Windows: Tricks of the Trade

4.1 Materialized Aggregates

For metrics dashboards, precompute:

CREATE MATERIALIZED VIEW mv_user_errors AS
SELECT user_id, COUNT(*) AS error_count
FROM events
WHERE error_flag
GROUP BY user_id;
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_user_errors;

4.2 Window Functions vs. GROUP BY

When you need both raw rows and aggregates:

SELECT
  order_id,
  amount,
  SUM(amount) OVER (PARTITION BY customer_id) AS total_per_customer
FROM orders;

Use an index on (customer_id, amount) to speed windows.

5. Partitioning & Parallelism: Scale Out Safely

5.1 Declarative Partitioning

Split by time or key:

CREATE TABLE metrics (
  ts DATE,
  value DOUBLE PRECISION
) PARTITION BY RANGE (ts);
CREATE TABLE metrics_2025_q1 PARTITION OF metrics
 FOR VALUES FROM ('2025-01-01') TO ('2025-04-01');

5.2 Harness Parallel Queries

Enable in postgresql.conf:

max_parallel_workers_per_gather = 4

Then large scans auto-split across CPUs.

6. Housekeeping: VACUUM, ANALYZE & Maintenance

6.1 Autovacuum Tuning

Ensure autovacuum thresholds fit your workload. For high-churn tables:

ALTER TABLE big_table
SET ( autovacuum_vacuum_scale_factor = 0.05,
      autovacuum_analyze_scale_factor = 0.02 );

6.2 Fillfactor for Write-Heavy Tables

Reserve free space to reduce page splits:

ALTER TABLE logs SET (fillfactor = 70);

7. Real‑World Case Study: 80% Speedup

Scenario: A nightly report took 10 minutes. By applying:

Partial index on status
Function-based view
Partition pruning on date
Autovacuum tuning

we tracked its execution plan changes:

-- Before:
Seq Scan on orders  (time: 600s)
-- After:
Index Only Scan using idx_status_date  (time: 120s)

From 10 min → 2 min: a success story to inspire your own triumphs.

Key Takeaways

Measure first with EXPLAIN ANALYZE (BUFFERS).
Index smartly: partial, expression, BRIN.
Encapsulate complex logic in functions/views.
Precompute heavy aggregates with materialized views.
Partition & parallelize for scale.
Maintain: VACUUM, ANALYZE, and fillfactor.

8. Beyond the Basics: Advanced Techniques

8.1 Adaptive Query Plans with pg_stat_statements

Track your most expensive statements:

CREATE EXTENSION pg_stat_statements;
SELECT query, calls, total_time, mean_time
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 5;

Use this insight to prioritize optimizations.

8.2 Plan Stability with Prepared Statements

For queries with variable patterns, prepared statements lock in good plans:

PREPARE fast_search(text) AS
SELECT * FROM products WHERE description ILIKE $1;
EXECUTE fast_search('%widget%');

8.3 In-Memory Speed with UNLOGGED Tables

Temp-heavy data can live in RAM:

CREATE UNLOGGED TABLE temp_hits AS
SELECT ...;

8.4 Smart Caching Layers

Combine Redis or PGSQL's native caching:

DO $$
BEGIN
  PERFORM pg_prewarm('hot_table');
END;
$$;

Creative Corner: Visualizing Data Flow

Bring your diagrams to life—they guide both your brain and your team.

Final Thoughts: The Art of Performance

Optimizing SQL is equal parts science and art. It’s a continuous journey: measure, tweak, observe, and repeat. With these techniques—from core index strategies to creative caching and plan management—you’re equipped to turn any tortoise into a cheetah.
Remember: the fastest query is the one you never run. Cache wisely, precompute where it counts, and let your database shine.