PostgreSQL Parallel Query: Configuration & Performance Tuning

#database #performance #postgres #tutorial

PostgreSQL Parallel Query: Configuration & Performance Tuning

Your analytical query scans a 50 GB table, aggregates 200 million rows, and takes 25 seconds. Your server has 16 CPU cores. PostgreSQL uses... 2 of them. The other 14 sit idle. The max_parallel_workers_per_gather default of 2 is leaving 7x potential speedup on the table. Let's fix that -- and understand when you should not.

How Parallel Query Works

PostgreSQL divides large operations across multiple CPU cores. Worker processes each scan a portion of the data, feed results through a Gather node to the leader process, which combines them. Sequential scans, hash joins, aggregates, and B-tree index scans all support parallel execution.

The key defaults:

max_parallel_workers_per_gather = 2 -- max workers per parallel operation
max_parallel_workers = 8 -- total parallel workers across all sessions
max_worker_processes = 8 -- total background workers (shared with other subsystems)
min_parallel_table_scan_size = 8MB -- minimum table size for parallel scan
parallel_setup_cost = 1000 -- planner's estimate for starting a worker
parallel_tuple_cost = 0.1 -- per-tuple transfer cost estimate

These are tuned for general-purpose workloads. For analytical queries on large tables, they're far too conservative.

Detecting the Problem

Check your current settings:

SELECT name, setting, unit, short_desc
FROM pg_settings
WHERE name LIKE '%parallel%'
   OR name = 'max_worker_processes'
ORDER BY name;

Check whether queries actually use parallel workers:

EXPLAIN (ANALYZE, VERBOSE, BUFFERS)
SELECT customer_region, count(*), sum(order_total)
FROM orders
WHERE created_at >= '2026-01-01'
GROUP BY customer_region;

Look for:

Gather or Gather Merge -- parallel execution is happening
Workers Planned: 2 and Workers Launched: 2 -- how many workers
Workers Launched < Workers Planned -- system ran out of workers

Check if workers are being exhausted:

-- Currently active parallel workers
SELECT count(*) AS active_parallel_workers
FROM pg_stat_activity
WHERE backend_type = 'parallel worker';

-- The limit
SELECT setting AS max_parallel_workers
FROM pg_settings
WHERE name = 'max_parallel_workers';

If active workers frequently approach the limit, queries are competing for workers and some run with fewer than planned.

Tuning for Analytical Workloads

If your database runs analytical queries on large tables:

-- More workers per query
SET max_parallel_workers_per_gather = 4;

-- More total workers
ALTER SYSTEM SET max_parallel_workers = 16;

-- Enough background worker slots
ALTER SYSTEM SET max_worker_processes = 20;

-- Lower table size threshold
ALTER SYSTEM SET min_parallel_table_scan_size = '1MB';

-- Apply (max_worker_processes requires restart)
SELECT pg_reload_conf();

Rule of thumb: max_parallel_workers_per_gather = half your CPU cores, max_parallel_workers = total cores. On a 16-core server: 8 and 16 respectively.

Lower Cost Thresholds

If medium-sized tables aren't getting parallelized despite adequate configuration:

ALTER SYSTEM SET parallel_setup_cost = 100;    -- default: 1000
ALTER SYSTEM SET parallel_tuple_cost = 0.01;   -- default: 0.1
SELECT pg_reload_conf();

Lower parallel_setup_cost makes the planner consider parallelism for smaller operations. Lower parallel_tuple_cost makes parallel plans look cheaper.

Per-Session Overrides

For mixed workloads, set parallelism based on the connection type:

-- Reporting query: maximum parallelism
SET LOCAL max_parallel_workers_per_gather = 8;
SET LOCAL parallel_setup_cost = 0;
SET LOCAL parallel_tuple_cost = 0;

SELECT region, count(*), avg(order_total)
FROM orders
GROUP BY region;

-- OLTP session: disable parallelism
SET LOCAL max_parallel_workers_per_gather = 0;

Per-Table Settings

For critical large tables:

-- Guarantee up to 8 workers for scans on this table
ALTER TABLE orders SET (parallel_workers = 8);

This overrides the planner's automatic worker count calculation.

What Parallelizes (and What Doesn't)

Operation	Parallel?
Sequential Scan	Yes
B-tree Index Scan	Yes
Bitmap Heap Scan	Yes
Hash Join	Yes
Merge Join	Yes
Nested Loop	Yes (outer side)
Aggregate (count, sum, avg)	Yes
CREATE INDEX (B-tree)	Yes
Append (UNION ALL, partitions)	Yes
UPDATE, DELETE	No
CTEs (WITH queries)	No
Cursors	No
FOR UPDATE/SHARE	No

The Memory Multiplication Trap

work_mem applies per worker. A query with work_mem = 256MB and 4 parallel workers can consume 1.28 GB for sorting and hashing. Budget accordingly:

max_connections * max_parallel_workers_per_gather * work_mem < available RAM

This catches people who increase parallelism without accounting for memory.

Verify the Impact

Compare sequential vs parallel execution:

-- Disable parallelism
SET LOCAL max_parallel_workers_per_gather = 0;
EXPLAIN (ANALYZE, BUFFERS, TIMING)
SELECT customer_region, count(*), sum(order_total)
FROM orders
GROUP BY customer_region;

-- Enable parallelism
SET LOCAL max_parallel_workers_per_gather = 4;
EXPLAIN (ANALYZE, BUFFERS, TIMING)
SELECT customer_region, count(*), sum(order_total)
FROM orders
GROUP BY customer_region;

The parallel plan should show roughly sequential_time / (1 + num_workers) execution time, with 60-80% of theoretical speedup typical due to Gather overhead.

Prevention Strategy

OLAP databases: aggressive parallelism. max_parallel_workers_per_gather = CPU_cores / 2, max_parallel_workers = CPU_cores, lower cost thresholds.

OLTP databases: keep defaults or disable. Many short concurrent queries don't benefit -- worker overhead exceeds speedup on small queries.

Mixed workloads: per-connection settings. Reporting connections get high parallelism. App connections get zero.

Monitor Workers Launched vs Workers Planned. Consistent shortfall means you need more max_parallel_workers. If CPU hits 100% during parallel queries and other sessions slow down, reduce max_parallel_workers_per_gather.

Originally published at mydba.dev/blog/postgres-parallel-query