Wait Event Heatmaps: See What PostgreSQL Is Actually Waiting On

#database #monitoring #performance #postgres

Wait Event Heatmaps: See What PostgreSQL Is Actually Waiting On

If you have ever stared at a Grafana dashboard full of green CPU and memory panels while your users complain about slow queries, you have experienced the fundamental blind spot of resource utilization monitoring. PostgreSQL has a built-in profiler that most teams completely ignore — wait events.

The Problem

Your database feels slow but the usual metrics look fine. CPU is at 30%. Memory is stable. Disk I/O graphs show nothing dramatic. The problem is invisible because you are looking at resource utilization metrics, which tell you how much of each resource is consumed but not what PostgreSQL is spending its time doing.

PostgreSQL tracks exactly what every backend is waiting on at any given moment. These are wait events, and they fall into categories: IO (reading or writing data files), Lock (heavyweight row or table locks), LWLock (lightweight internal locks like buffer mapping or WAL insertion), BufferPin (waiting for a shared buffer), Client (waiting for the application to send data or read results), and Activity (background process idle waits). When a backend is actively computing — executing an operator, sorting rows, evaluating expressions — there is no wait event at all.

Wait events are the closest thing PostgreSQL offers to a profiler. They tell you whether your database is slow because of I/O, lock contention, internal bottlenecks, or client-side delays. A database that shows 50% of sessions in IO:DataFileRead has a fundamentally different problem than one showing 50% in Lock:transactionid — the first needs better caching or faster storage, the second needs shorter transactions. Without wait event data, you are guessing at the root cause.

How to Detect It

Query pg_stat_activity to see what each active backend is currently waiting on:

-- Current wait events for all active sessions
SELECT
    pid,
    wait_event_type,
    wait_event,
    state,
    left(query, 80) AS query_snippet,
    now() - query_start AS query_duration
FROM pg_stat_activity
WHERE state = 'active'
    AND pid != pg_backend_pid()
ORDER BY wait_event_type, wait_event;

-- Aggregate current wait events by type
SELECT
    wait_event_type,
    wait_event,
    count(*) AS session_count
FROM pg_stat_activity
WHERE state = 'active'
    AND wait_event IS NOT NULL
ORDER BY count(*) DESC;

The problem with these queries is that wait events are transient. A session's wait event changes every millisecond — it reads a data page (IO:DataFileRead), processes it (no wait event), reads another page, acquires a buffer pin (BufferPin:BufferPin), and so on. A single-point query captures whatever is happening at that exact instant, which may or may not represent the actual bottleneck.

To build a meaningful picture, you need to sample repeatedly and aggregate. Some teams write cron jobs that query pg_stat_activity every second and log the results. This works, but building a sampling infrastructure, storing the time-series data, and visualizing the results is a significant investment.

Building a Wait Event Picture

The most effective way to use wait event data is as a heatmap over time. Each row represents a wait event category, and color intensity shows how many sessions were in that state at each point. This format reveals patterns that point-in-time queries cannot:

I/O spikes aligned with checkpoint intervals — If IO:DataFileRead spikes every 5 minutes, your checkpoint is too aggressive or shared_buffers is too small.
Lock contention correlated with batch jobs — Lock:transactionid spikes at the same time every day point to a batch process with long transactions.
LWLock pressure during peak writes — LWLock:WALInsert during high write throughput indicates WAL is a bottleneck.
Client waits indicating application issues — Client:ClientRead means PostgreSQL is waiting for your application, not the other way around.

Drilling into a specific wait event type to see which queries contribute the most wait time turns a vague "the database is slow" into a specific "this query on this table is causing 40% of all I/O waits."

How to Fix It

Each wait event category maps to a different class of fix:

IO:DataFileRead / IO:DataFileExtend — The database is reading data from disk instead of shared buffers, or extending table files during inserts.

-- Check buffer cache hit ratio
SELECT
    sum(blks_hit) * 100.0 / nullif(sum(blks_hit) + sum(blks_read), 0)
        AS cache_hit_ratio
FROM pg_stat_database
WHERE datname = current_database();

If the hit ratio is below 99%, increase shared_buffers. If a specific query causes most of the reads, add an index to reduce the number of pages scanned. If the hit ratio is already high but I/O waits persist, the storage subsystem is the bottleneck — consider faster disks or moving to NVMe.

Lock:transactionid / Lock:tuple — Sessions are waiting for other transactions to commit or for row-level locks.

-- Find long-running transactions holding locks
SELECT pid, state, now() - xact_start AS transaction_duration,
       left(query, 100) AS query_snippet
FROM pg_stat_activity
WHERE state = 'idle in transaction'
ORDER BY xact_start LIMIT 10;

Reduce transaction duration. Set idle_in_transaction_session_timeout to automatically kill forgotten sessions. Break large batch operations into smaller transactions.

LWLock:WALWrite / LWLock:WALInsert — WAL writing is a bottleneck, typically during heavy write workloads.

Move the WAL directory to faster storage. Increase wal_buffers. Tune checkpoint_completion_target toward 0.9 to spread checkpoint I/O more evenly.

Client:ClientRead / Client:ClientWrite — PostgreSQL is waiting for the application to send the next command or read the result. This is not a database problem — investigate network latency, connection pooler configuration, or application-side processing delays.

How to Prevent It

Establish wait event baselines for your workload. A healthy OLTP database typically shows minimal wait events during normal operation — most time is spent actively processing, not waiting. When a new wait event pattern appears or an existing one intensifies, it signals a change in workload or a developing bottleneck.

Sample wait events at regular intervals and retain the history. The value of wait event monitoring increases over time as you build context for what "normal" looks like. A 10% increase in IO:DataFileRead after a deployment points directly at a query change. A gradual increase in LWLock:BufferMapping over weeks suggests growing memory pressure as data volume increases.

Pair wait event monitoring with query-level analysis to close the loop — when a wait event spikes, drill through to the specific queries causing it, fix them, and verify the wait event subsides.