Rohith Gilla

Posted on Apr 11 • Originally published at datapeek.dev

I Put pg_stat_activity in My SQL Client — And Added a Kill Button

#database #postgres #devops #opensource

Includes live table and cache-hit stats

It was 11:47 PM when the Datadog alert came in. API latency on /checkout had
tripled. I SSH'd into the bastion, opened psql, and started typing the query
I have typed a thousand times:

SELECT pid, state, EXTRACT(EPOCH FROM (now() - query_start)) AS seconds, query
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY query_start;

Then I ran it. Then I ran it again six seconds later. Then again. Then I
opened a second tab and started typing SELECT * FROM pg_locks because I was
pretty sure one of the long-running queries was blocking something. Three
terminals, two SSH sessions, one increasingly tired human.

Every Postgres operator has been here. The data you need lives in system
views — pg_stat_activity, pg_locks, pg_statio_user_tables,
pg_stat_user_tables. Reaching for them through a shell at midnight is a
productivity tax I got tired of paying.

So I built the Connection Health Monitor into data-peek.

What it does

The Health Monitor is a dedicated tab — not a modal, not a sidebar — that
refreshes every 2, 5, 10, or 30 seconds (configurable), and shows four panels:

Active Queries — everything in pg_stat_activity that isn't idle, with duration, wait events, and a kill button next to each row.
Table Sizes — the top 50 tables by total size, including heap and index bytes, with a row-count estimate from pg_stat_user_tables.n_live_tup.
Cache Hit Ratios — buffer and index cache hit percentages, plus a per-table breakdown of seq-scan vs index-scan counts.
Locks & Blocking — the classic blocked/blocking join from pg_locks, with both the blocked query and the query holding the lock visible side by side.

The kill button calls pg_cancel_backend(pid). No confirmation dialog. If
you hit it by accident, the worst that happens is a query fails and you run
it again. That is the right tradeoff at midnight.

Every panel has a "Share" button that generates a clean screenshot suitable
for pasting into an incident Slack channel. That last bit came from a real
incident where I wanted to show the on-call DBA what I was seeing and ended
up cropping and masking a terminal screenshot for ten minutes. Now it is a
single click.

How it's wired

The Health Monitor is an honest, boring pipeline, and I think that is the
point.

┌────────────────────────────┐
│  HealthMonitor.tsx (React) │  ← refresh interval, kill buttons, share UI
└──────────────┬─────────────┘
               │ window.api.db.*
┌──────────────┴─────────────┐
│  health-handlers.ts (IPC)  │  ← ipcMain.handle('db:active-queries', …)
└──────────────┬─────────────┘
               │ getAdapter(config)
┌──────────────┴─────────────┐
│  postgres-adapter.ts       │  ← the actual SQL against pg_stat_*
└────────────────────────────┘

The IPC handlers (src/main/ipc/health-handlers.ts) are wafer-thin — each
one is about ten lines, dispatches to the adapter, and wraps the result in
the project's standard IpcResponse<T> shape:

ipcMain.handle('db:active-queries', async (_, config: ConnectionConfig) => {
  try {
    const adapter = getAdapter(config)
    const queries = await adapter.getActiveQueries(config)
    return { success: true, data: queries } as IpcResponse<typeof queries>
  } catch (error) {
    log.error('Failed to get active queries:', error)
    return { success: false, error: String(error) } as IpcResponse<never>
  }
})

That pattern ensures every health panel can fail independently. If pg_locks
is slow because something dramatic is happening, the Active Queries panel
still refreshes on its own schedule.

The SQL, unvarnished

The thing dev.to tutorials usually hide — the actual SQL — is the part I find
most useful. Here is the query powering the Active Queries panel, straight
from src/main/adapters/postgres-adapter.ts:

SELECT
  pid,
  usename AS user,
  datname AS database,
  state,
  COALESCE(
    EXTRACT(EPOCH FROM (now() - query_start))::text || 's',
    '0s'
  ) AS duration,
  COALESCE(EXTRACT(EPOCH FROM (now() - query_start)) * 1000, 0)::bigint AS duration_ms,
  query,
  wait_event_type || ':' || wait_event AS wait_event,
  application_name
FROM pg_stat_activity
WHERE state != 'idle'
  AND pid != pg_backend_pid()
  AND query NOT LIKE '%pg_stat_activity%'
ORDER BY query_start ASC NULLS LAST

Two details worth calling out:

pid != pg_backend_pid() filters out the monitoring query itself. Without this, you spend ten minutes wondering why there is always a query running.
query NOT LIKE '%pg_stat_activity%' is a belt-and-braces filter for the case where another client is ALSO polling this view. When I skipped it, the dashboard kept showing "the monitoring client is monitoring itself."

The cache-hit ratio query is where people usually copy-paste wrong:

SELECT
  CASE WHEN SUM(heap_blks_hit) + SUM(heap_blks_read) = 0 THEN 0
    ELSE ROUND(SUM(heap_blks_hit)::numeric / (SUM(heap_blks_hit) + SUM(heap_blks_read)) * 100, 2)
  END AS buffer_cache_hit_ratio,
  CASE WHEN SUM(idx_blks_hit) + SUM(idx_blks_read) = 0 THEN 0
    ELSE ROUND(SUM(idx_blks_hit)::numeric / (SUM(idx_blks_hit) + SUM(idx_blks_read)) * 100, 2)
  END AS index_hit_ratio
FROM pg_statio_user_tables

The CASE guards against division by zero on freshly restarted databases.
Without them, the panel shows NaN% for the first few minutes after a
restart, which is exactly when you are most likely to be looking at it.

The lock detection join

The Locks panel is the only query I am genuinely proud of. The canonical
"blocked by whom" join in Postgres is notoriously ugly because pg_locks
does not give you a single "blocker pid" column — you have to self-join on
every distinguishing column:

FROM pg_locks blocked
JOIN pg_stat_activity blocked_activity  ON blocked.pid = blocked_activity.pid
JOIN pg_locks blocking ON (
  blocked.locktype       = blocking.locktype
  AND blocked.database   IS NOT DISTINCT FROM blocking.database
  AND blocked.relation   IS NOT DISTINCT FROM blocking.relation
  AND blocked.page       IS NOT DISTINCT FROM blocking.page
  AND blocked.tuple      IS NOT DISTINCT FROM blocking.tuple
  AND blocked.virtualxid IS NOT DISTINCT FROM blocking.virtualxid
  AND blocked.transactionid IS NOT DISTINCT FROM blocking.transactionid
  AND blocked.classid    IS NOT DISTINCT FROM blocking.classid
  AND blocked.objid      IS NOT DISTINCT FROM blocking.objid
  AND blocked.objsubid   IS NOT DISTINCT FROM blocking.objsubid
  AND blocked.pid != blocking.pid
)
JOIN pg_stat_activity blocking_activity ON blocking.pid = blocking_activity.pid
WHERE NOT blocked.granted AND blocking.granted

IS NOT DISTINCT FROM is the operator that treats NULL = NULL as true,
which matters because most of these columns are NULL for any given lock type.
If you use plain = here, the join silently returns zero rows and you
conclude that nothing is blocking anything. Ask me how I know.

The kill button

async killQuery(config, pid) {
  const client = new Client(buildClientConfig(config, tunnelOverrides))
  await client.connect()
  const result = await client.query('SELECT pg_cancel_backend($1) AS cancelled', [pid])
  const cancelled = result.rows[0]?.cancelled === true
  return cancelled
    ? { success: true }
    : { success: false, error: 'Failed to cancel query - process may have already completed' }
}

pg_cancel_backend sends SIGINT to the backend process. It is the polite
version — pg_terminate_backend is the hammer, and I deliberately do not
expose it from the UI because killing a backend mid-transaction is a foot-gun
that no amount of confirm dialogs can save you from. If pg_cancel_backend
fails to stop the query, the assumption is that you want to keep looking
before escalating.

What I'd do differently

Two things I regret.

First, I hard-coded LIMIT 50 in the Table Sizes query. It was fine for my
own databases. Then someone with 12,000 tables opened the panel and their
disk groaned for nine seconds before anything appeared. A parameterized limit
with a default would have saved that. I will get to it.

Second, the refresh interval is per-panel but the polling is not staggered.
When all four panels refresh at 2s, they all fire at the same tick, and a
slow database sees four simultaneous connections every two seconds. Staggering
them by 500ms each would be gentler on the server. Classic premature
optimization trap — I built the simple version, shipped it, and only noticed
when a coworker complained.

Try it

If you want to see the Health Monitor without downloading anything, the
screenshots are in the data-peek README.
The code for this feature lives in src/renderer/src/components/health-monitor.tsx
and src/main/adapters/postgres-adapter.ts.

The app itself is at datapeek.dev — MIT source,
free for personal use, one-time license for commercial. I would rather you
read the SQL than click the download button, honestly. If it is the kind of
thing you'd want in your toolbox, the download is one click from there.

Top comments (2)

mote • Apr 12

This is a clever approach to database monitoring! For anyone dealing with runaway queries in embedded or edge scenarios (like IoT devices), the options are more limited since you can't always install pg_stat_activity.

Have you considered how this would work in a microservices environment where each service might need its own monitoring view? Or have you explored any lightweight alternatives for environments where running a full Postgres instance isn't feasible?

Rohith Gilla • Apr 13

Hey yes!!
You can build your custom dashboards too 🙌