DEV Community

Cover image for Real-Time Session Monitoring: See Every Active Query Right Now
Philip McClarence
Philip McClarence

Posted on

Real-Time Session Monitoring: See Every Active Query Right Now

Three minutes into an incident is not the time to be crafting SQL queries. But that is exactly what happens when your application starts returning 504 timeouts and you need to understand what the database is doing right now. You SSH in, run SELECT * FROM pg_stat_activity, and spend the next two minutes parsing 60 rows of varying states, truncated query text, and cryptic wait events while the incident clock keeps ticking.

There has to be a better way to get real-time visibility into active database sessions. Let us build the mental model for what to look for and how to act on it.

The Diagnostic Starting Point

The standard query for active sessions:

SELECT
    pid,
    usename AS username,
    datname AS database_name,
    client_addr,
    state,
    wait_event_type,
    wait_event,
    now() - xact_start AS transaction_duration,
    now() - query_start AS query_duration,
    left(query, 100) AS query_preview
FROM pg_stat_activity
WHERE state != 'idle'
  AND pid != pg_backend_pid()
ORDER BY query_start ASC NULLS LAST;
Enter fullscreen mode Exit fullscreen mode

This filters out idle connections and sorts oldest-first, putting the longest-running sessions at the top. The wait_event_type and wait_event columns are your most important diagnostic tool:

  • Lock / relation -- waiting for a table lock
  • LWLock / BufferContent -- waiting for a buffer page
  • IO / DataFileRead -- doing physical I/O

Finding Lock Chains

When sessions are blocked, you need to trace the chain back to the blocker:

SELECT
    blocked.pid AS blocked_pid,
    blocked.query AS blocked_query,
    blocker.pid AS blocker_pid,
    blocker.query AS blocker_query,
    now() - blocked.query_start AS blocked_duration
FROM pg_stat_activity blocked
JOIN pg_locks bl ON bl.pid = blocked.pid AND NOT bl.granted
JOIN pg_locks gl ON gl.locktype = bl.locktype
    AND gl.database IS NOT DISTINCT FROM bl.database
    AND gl.relation IS NOT DISTINCT FROM bl.relation
    AND gl.page IS NOT DISTINCT FROM bl.page
    AND gl.tuple IS NOT DISTINCT FROM bl.tuple
    AND gl.pid != bl.pid
    AND gl.granted
JOIN pg_stat_activity blocker ON blocker.pid = gl.pid;
Enter fullscreen mode Exit fullscreen mode

This reveals the full blocking chain -- which session is holding the lock, what query it is running, and how long the blocked sessions have been waiting.

The Three Most Common Session Problems

1. Idle in Transaction

A connection opened a transaction and never committed or rolled back. This holds locks and prevents autovacuum from cleaning up dead tuples:

-- Systemic fix: auto-terminate idle-in-transaction sessions
ALTER SYSTEM SET idle_in_transaction_session_timeout = '5min';
SELECT pg_reload_conf();

-- Immediate fix: terminate a specific session
SELECT pg_terminate_backend(12345);
Enter fullscreen mode Exit fullscreen mode

Five minutes is a reasonable default for OLTP workloads. If your application legitimately needs long-open transactions, increase it -- but have a limit.

2. Long-Running Queries

A report query or poorly optimized application query consuming resources for minutes:

-- Cancel the query (session stays alive)
SELECT pg_cancel_backend(12345);

-- If cancel doesn't work, terminate the session
SELECT pg_terminate_backend(12345);
Enter fullscreen mode Exit fullscreen mode

Always try pg_cancel_backend first. It cancels the current query but keeps the connection alive. pg_terminate_backend closes the entire connection, which may cause reconnection overhead.

3. Connection Saturation

All available connections are in use:

SELECT count(*) AS active_connections,
       current_setting('max_connections')::int AS max_connections,
       round(100.0 * count(*) / current_setting('max_connections')::int) AS usage_pct
FROM pg_stat_activity;
Enter fullscreen mode Exit fullscreen mode

If usage exceeds 80%, either your connection pool is misconfigured (too many connections per application instance) or connections are leaking (opened but never returned to the pool). Look for idle connections open for hours -- these are usually leaks.

Setting Up Guardrails

Do not wait for incidents to set timeouts. Configure these proactively:

-- Prevent runaway queries
ALTER SYSTEM SET statement_timeout = '30s';
-- Prevent forgotten transactions
ALTER SYSTEM SET idle_in_transaction_session_timeout = '5min';
-- Prevent idle connections from consuming slots
ALTER SYSTEM SET idle_session_timeout = '30min';
SELECT pg_reload_conf();
Enter fullscreen mode Exit fullscreen mode

These are safety nets, not solutions. A query hitting statement_timeout should be investigated and optimized, not just retried. But having guardrails prevents individual sessions from escalating into full-database incidents.

Monitoring Session Trends

Real-time monitoring is reactive by nature. The proactive layer is tracking session states over time:

  • A steady increase in average active sessions signals growing load or degrading query performance
  • Rising "idle in transaction" sessions indicate application-side transaction management issues
  • Connection count approaching max_connections means your capacity planning needs revision

Build connection budgets into your planning. If max_connections = 100 and four application instances each run pools of 25 connections, you have zero headroom. Reserve 10-20% for administrative access and monitoring.

Full article with real-time monitoring examples: Real-Time Session Monitoring: See Every Active Query Right Now

Top comments (0)