From $1 Placeholders to Executable Queries: Debugging the Parameters That Matter
Every PostgreSQL DBA has been in this situation: a query shows up as slow in pg_stat_statements, but the template uses $1 placeholders and you have no idea which parameter values actually caused the slow execution. The query text is WHERE customer_id = $1 — but which customer? The one with 15 orders or the one with 500,000? Without the actual values, you cannot reproduce the problem.
The Problem
pg_stat_statements is the foundation of PostgreSQL query performance monitoring. It aggregates execution statistics per query template, grouping all executions of SELECT * FROM orders WHERE customer_id = $1 regardless of the parameter value. This is exactly what you want for identifying expensive query patterns — but it hides a critical detail.
A query might average 5ms across 100,000 calls but take 12 seconds for one specific customer. The customer with ID 847291 has 500,000 orders while the median customer has 15. The index scan that works perfectly for most customers degrades to near-sequential-scan performance for this outlier. The average hides the catastrophe.
You cannot reproduce this problem from pg_stat_statements alone. You see WHERE customer_id = $1 and the mean time, but not the parameter value that caused the outlier. Without the actual parameter values, you cannot run EXPLAIN ANALYZE to see what the planner does for that specific input.
The standard workaround is to search PostgreSQL logs. If log_min_duration_statement is configured, slow queries are logged with their parameters in a DETAIL line. But parsing log files to extract parameters, correlate them with query templates, and reconstruct executable queries is manual, tedious, and error-prone — exactly the kind of work that gets skipped during a production incident.
How to Detect It
PostgreSQL can log slow queries with their parameter values, but you need the right configuration:
-- Check if slow query logging is enabled
SHOW log_min_duration_statement;
-- -1 means disabled, 0 logs everything, positive value is milliseconds
-- Enable logging of queries slower than 500ms
ALTER SYSTEM SET log_min_duration_statement = 500;
SELECT pg_reload_conf();
When enabled, slow queries appear in the PostgreSQL log with a DETAIL line containing the parameter values:
LOG: duration: 12847.234 ms execute :
SELECT * FROM orders WHERE customer_id = $1 AND order_date > $2
DETAIL: parameters: $1 = '847291', $2 = '2020-01-01'
To reconstruct an executable query from this, you must manually substitute each parameter:
-- Manually reconstructed from log parsing
SELECT * FROM orders
WHERE customer_id = '847291'
AND order_date > '2020-01-01';
This works for one query. For systematic analysis, you would need to parse the log files programmatically, match DETAIL lines to their query statements, and build executable queries for each slow execution. Most teams lack this tooling, so parameter values are effectively lost.
Another approach is auto_explain, which logs the EXPLAIN plan for slow queries. This gives you the plan but still not the executable query you can paste into psql to reproduce:
-- Enable auto_explain for queries over 1 second
LOAD 'auto_explain';
SET auto_explain.log_min_duration = 1000;
SET auto_explain.log_analyze = true;
Automating Parameter Extraction
The ideal solution is a monitoring tool that parses PostgreSQL logs automatically, extracts DETAIL parameters, and substitutes them into query templates to produce executable queries. This turns a tedious manual process into an automated pipeline.
With automated extraction, each query template accumulates a collection of real-world parameter samples. Templates with zero executable samples use $N placeholders that cannot be run directly. Templates with samples have actual parameter values from the logs. The higher the sample count, the more diverse the parameter combinations available for testing.
This is particularly valuable for queries where performance varies dramatically based on input. You can test with the actual values that caused slow executions — not guessed values or synthetic test data, but the real parameters from production.
The workflow becomes: find a slow query template, click into its samples, copy the executable query with one click, and paste it into psql with EXPLAIN (ANALYZE, BUFFERS) to see exactly what the planner does for that specific parameter combination.
How to Fix It
Once you have an executable query with the problematic parameters, diagnose the plan:
-- Run EXPLAIN ANALYZE with the exact parameters from the sample
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders
WHERE customer_id = '847291'
AND order_date > '2020-01-01';
Compare this plan to the same query with a typical parameter value:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders
WHERE customer_id = '12345'
AND order_date > '2020-01-01';
Common findings:
Skewed data distribution: Customer 847291 has 500,000 rows, the planner estimates 15 (the average). Fix with extended statistics or a partial index:
-- Extended statistics for correlated columns
CREATE STATISTICS orders_customer_date_stats
ON customer_id, order_date FROM orders;
ANALYZE orders;
-- Partial index for high-volume customers
CREATE INDEX CONCURRENTLY idx_orders_high_volume
ON orders (customer_id, order_date)
WHERE customer_id IN (SELECT customer_id FROM orders
GROUP BY customer_id HAVING count(*) > 10000);
Date range too wide: $2 = '2020-01-01' pulls 5 years of data. The application may need pagination or the query needs a tighter default range.
Generic plan vs. custom plan: On PostgreSQL 12+, after 5 executions the planner may switch to a generic plan that is suboptimal for outlier parameter values. Check with plan_cache_mode = force_custom_plan as a diagnostic.
How to Prevent It
Ensure log_min_duration_statement is set to a reasonable threshold — 500ms to 1000ms is a good starting point. Too low floods the logs, too high misses important slow queries. This single setting is the prerequisite for parameter extraction; without it, no tool can recover parameter values from normalized templates.
Set log_parameter_max_length to at least 1024 (PostgreSQL 13+) to ensure long parameter values are not truncated in the log DETAIL line. Truncated parameters produce incomplete executable queries.
Review captured samples periodically to identify parameter patterns that consistently cause slow execution. If the same customer ID or date range appears across multiple slow samples, the problem is data skew rather than a missing index.
Top comments (0)