DEV Community

SEN LLC
SEN LLC

Posted on

I Got Tired of Pasting EXPLAIN Plans Into Web Tools, So I Wrote pg-explain-viz

I Got Tired of Pasting EXPLAIN Plans Into Web Tools, So I Wrote pg-explain-viz

A small Rust CLI that reads EXPLAIN (FORMAT JSON) from a file or stdin and renders the same kind of indented tree you'd get from explain.depesz.com — but locally, with misestimation highlighting and a slow-node summary, in about 900 lines and three dependencies.

📦 GitHub: https://github.com/sen-ltd/pg-explain-viz

Screenshot

The problem

PostgreSQL's EXPLAIN ANALYZE is the single best query-tuning tool in any database, but the raw text output is genuinely painful to read once a plan gets non-trivial. The community response has produced three excellent visualizers and one underused observation:

  • pgAdmin has a beautiful visual EXPLAIN tab. But it's a desktop app, you have to be connected to your DB through it, and on a remote production box you usually aren't.
  • explain.dalibo.com and explain.depesz.com are wonderful web tools and I've used them for years. But they require uploading your query plan, which routinely embeds table names, column names, partial-index predicates, view definitions, sometimes literal user IDs in the Filter strings. Pasting that into a third-party site is a thing I always feel slightly wrong doing — and on regulated workloads it's a thing you straight-up cannot do.
  • Visual EXPLAIN inside the IDE (DataGrip, DBeaver). Same desktop-app problem; not what I want when I'm SSH'd into a box at 2am.

The underused observation is that since Postgres 9.0, EXPLAIN (FORMAT JSON) has produced a machine-readable tree with consistent key names. You don't need to scrape the human-readable EXPLAIN text format with regex (please don't). You can SELECT ... INTO json_data it directly. So a "local explain visualizer" is mostly a JSON walker — not a real parser — plus a tree renderer.

That's pg-explain-viz. Pipe psql into it, get the tree:

psql -At -c "EXPLAIN (ANALYZE, FORMAT JSON) SELECT ..." | pg-explain-viz -
Enter fullscreen mode Exit fullscreen mode

About 900 lines of Rust across parser.rs + analyzer.rs + render.rs + color.rs + cli.rs + main.rs. Three runtime dependencies: clap = "4", serde = "1", serde_json = "1". Multi-stage rust:1.90-alpine Dockerfile produces a 9.5 MB image.

The shape of EXPLAIN JSON

Before any code, here's what EXPLAIN (ANALYZE, FORMAT JSON) actually returns. This is the bit that surprised me — the format is much friendlier than I'd expected.

[
  {
    "Plan": {
      "Node Type": "Hash Join",
      "Total Cost": 95.20,
      "Plan Rows": 1000,
      "Plan Width": 36,
      "Hash Cond": "(p.user_id = u.id)",
      "Actual Total Time": 14.901,
      "Actual Rows": 1000,
      "Actual Loops": 1,
      "Plans": [
        {
          "Node Type": "Seq Scan",
          "Relation Name": "posts",
          "Total Cost": 64.00,
          "Plan Rows": 2400,
          "Actual Total Time": 6.245,
          "Actual Rows": 2400,
          "Actual Loops": 1
        },
        { "Node Type": "Hash", "Plans": [ ... ] }
      ]
    },
    "Planning Time": 0.832,
    "Execution Time": 18.901
  }
]
Enter fullscreen mode Exit fullscreen mode

A few things to notice:

  1. The top level is an array. It contains one element. Postgres reserves the array shape because EXPLAIN can in principle be applied to multi-statement input, but no client uses that mode. Strip it.
  2. Each node has a "Plans" key holding its children. Recursion is the entire data structure. There's no separate "edges" table, no graph encoding, no IDs. The tree is the data.
  3. Generic keys are stable. "Node Type", "Total Cost", "Plan Rows", "Plan Width", "Actual Total Time", "Actual Rows", "Actual Loops" are present on every node where they make sense, with these exact spellings, since at least Postgres 11.
  4. Node-type-specific keys live in the same object. A Seq Scan carries "Relation Name". A Hash Join carries "Hash Cond". A Sort carries "Sort Key" (an array of strings). A Index Scan carries "Index Name" and "Index Cond". There's no nesting — they're sibling keys to the generic ones.

This last point is the design fork. You could write an exhaustive serde::Deserialize enum with a variant per node type and #[serde(tag = "Node Type")]. It would be beautiful. It would also be a maintenance trap: Postgres adds keys between minor releases and your code would silently lose information whenever it sees a new one.

So I went the other direction. Parse into serde_json::Value once, then pull out the keys I actually care about into a flat struct, and ignore everything else. Forward-compatible by construction, and the whole parser fits in 200 lines including tests.

The parser

The struct first:

#[derive(Debug, Clone, PartialEq)]
pub struct Plan {
    pub node_type: String,
    pub label: Option<String>,
    pub startup_cost: f64,
    pub total_cost: f64,
    pub plan_rows: u64,
    pub plan_width: u64,
    pub actual_time_ms: Option<f64>,
    pub actual_rows: Option<u64>,
    pub actual_loops: Option<u64>,
    pub children: Vec<Plan>,
}
Enter fullscreen mode Exit fullscreen mode

label is the one cute field. Postgres scatters the "what is this node operating on" information across many keys depending on node type. A Seq Scan wants Relation Name, a Hash Join wants Hash Cond, a Sort wants Sort Key, an Index Scan wants Index Name. Rather than make the renderer know about all of them, I let the parser pick the most useful one for each type:

fn pick_label(node_type: &str, obj: &serde_json::Map<String, Value>) -> Option<String> {
    let s = |k: &str| obj.get(k).and_then(Value::as_str).map(str::to_string);

    match node_type {
        "Seq Scan" | "Sample Scan" | "Tid Scan" => s("Relation Name"),
        "Index Scan" | "Index Only Scan" | "Bitmap Index Scan" => {
            s("Index Name").or_else(|| s("Relation Name"))
        }
        "Hash Join" | "Merge Join" | "Nested Loop" => s("Hash Cond")
            .or_else(|| s("Merge Cond"))
            .or_else(|| s("Join Filter")),
        "Sort" => obj.get("Sort Key").and_then(|v| v.as_array()).map(|xs| {
            xs.iter().filter_map(Value::as_str).collect::<Vec<_>>().join(", ")
        }),
        _ => None,
    }
}
Enter fullscreen mode Exit fullscreen mode

For Index Scan I prefer the index name over the relation name, because the index is the thing the planner actually picked — that's the interesting information, not which table. For joins I prefer the join condition over the join type, same reason.

Recursion then writes itself. parse_node walks one node, pulls the generic fields with unwrap_or(0), picks the type-specific label, recurses into Plans if present:

fn parse_node(v: &Value) -> Result<Plan, ParseError> {
    let obj = v.as_object().ok_or(ParseError::Shape("..."))?;
    let node_type = obj.get("Node Type")
        .and_then(Value::as_str)
        .ok_or(ParseError::Shape("..."))?
        .to_string();
    let label = pick_label(&node_type, obj);

    let children = match obj.get("Plans") {
        Some(Value::Array(xs)) => xs.iter().map(parse_node).collect::<Result<Vec<_>, _>>()?,
        _ => Vec::new(),
    };

    Ok(Plan { node_type, label, /* ... */, children })
}
Enter fullscreen mode Exit fullscreen mode

Real EXPLAIN trees are at most a few dozen levels deep even on hideous queries, so plain recursion is fine — no stack-overflow risk worth defending against.

Misestimation: the actual interesting analysis

Here is the single most useful piece of analysis you can do over an EXPLAIN ANALYZE plan:

Where did the planner's row-count estimate disagree with reality?

That's it. Almost every "why is my query slow" question has the answer "the planner thought a node would return a few rows so it picked a strategy that's quadratic in the input, but the node actually returns many thousands of rows." Bad estimates are the root cause; the visible symptom — Nested Loop, Bitmap Heap Scan, hash spill — is downstream.

The math is one division. The interesting part is that you have to remember Actual Rows is per-loop in Postgres, not total. So is Actual Total Time. If a node ran with Actual Loops = 5217, its true contribution is actual_rows × actual_loops. Forget that, and an inner-loop nested join looks fine when it's actually returning hundreds of thousands of rows.

pub fn misestimate(plan: &Plan) -> (Misestimate, f64) {
    let actual = match plan.actual_rows {
        Some(r) => r,
        None => return (Misestimate::Unknown, 1.0),
    };

    let loops = plan.actual_loops.unwrap_or(1).max(1);
    let effective_actual = (actual as f64) * (loops as f64);

    if plan.plan_rows == 0 {
        if effective_actual == 0.0 {
            return (Misestimate::Ok, 1.0);
        }
        return (Misestimate::Under, f64::INFINITY);
    }

    let ratio = effective_actual / (plan.plan_rows as f64);

    if ratio >= 2.0 {
        (Misestimate::Under, ratio)
    } else if ratio <= 0.5 {
        (Misestimate::Over, ratio)
    } else {
        (Misestimate::Ok, ratio)
    }
}
Enter fullscreen mode Exit fullscreen mode

The 2x threshold isn't a constant I made up — it matches what pgMustard and similar tools use as their "raise an eyebrow" point. The Postgres community's own folklore is "10x is when you should be alarmed", but at 2x you should at least look. In the renderer I show under-estimates in red because they're the dangerous direction (they cause the nested loop disasters) and over-estimates in yellow because they usually only waste memory rather than wreck latency.

The committed tests/fixtures/misestimate.json is a real-shaped example: a Nested Loop where the planner expected 10 output rows and got 184,523. The under-estimate ratio displayed is 18452.3x, and the inner Index Scan node — the one that's actually doing the damage, executed 5217 times — flags 91297.5x under because of the loop multiplier. That number alone tells you what to fix (ANALYZE orders, or a partial index on status='pending').

Slow-node ranking and the tree renderer

Once you've got the tree and the misestimation function, the rest is mechanical. Slow-node ranking is walk + sort:

pub fn slow_nodes<'a>(root: &'a Plan, n: usize) -> Vec<SlowNode<'a>> {
    let mut all = Vec::new();
    walk(root, &mut Vec::new(), &mut all);
    all.sort_by(|a, b| b.total_ms.total_cmp(&a.total_ms));
    all.truncate(n);
    all
}

fn walk<'a>(node: &'a Plan, path: &mut Vec<String>, out: &mut Vec<SlowNode<'a>>) {
    path.push(node.node_type.clone());

    if let Some(t) = node.actual_time_ms {
        let loops = node.actual_loops.unwrap_or(1).max(1);
        out.push(SlowNode {
            plan: node,
            total_ms: t * (loops as f64),
            path: path.clone(),
        });
    }

    for child in &node.children {
        walk(child, path, out);
    }
    path.pop();
}
Enter fullscreen mode Exit fullscreen mode

total_cmp because f64 isn't Ord by default. The path vec gets recorded with each entry so the summary at the bottom can disambiguate when there are several Seq Scans in one plan — you see Sort → Hash Join → Seq Scan instead of just Seq Scan and have to guess.

The tree renderer is a depth-first walk that maintains a prefix string and a is_last flag for the current node. The tree glyphs are the standard ├── / └── / / set, and the trick to make it look right is that the child's prefix depends on whether the parent was a last child or not:

let child_prefix = {
    let mut s = String::from(prefix);
    s.push_str(if is_last { "    " } else { "│   " });
    s
};
for (i, child) in plan.children.iter().enumerate() {
    render_node(child, &child_prefix, i + 1 == n, false, pal, out);
}
Enter fullscreen mode Exit fullscreen mode

If the parent was the last child of its parent, we extend the prefix with four spaces (no continuing vertical bar). Otherwise we extend it with so the next sibling's branch glyph lands at the right column.

The colors come from a Palette struct that branches once at construction time on enabled and then every getter returns either an ANSI escape code or "":

pub fn bad(&self) -> &'static str {
    if self.enabled { "\x1b[31m" } else { "" }
}
Enter fullscreen mode Exit fullscreen mode

The render code calls pal.bad() unconditionally and the no-color path costs nothing. Same pattern as hexview (entry #156 in the same series); I've stopped reaching for colored and owo-colors for small CLIs.

--format dot: the escape hatch

Some people genuinely want a picture, not text. Adding --format dot was easy because Plan is already a tree:

fn emit_dot(plan: &Plan, next_id: &mut u32, parent: Option<u32>, out: &mut String) {
    let id = *next_id;
    *next_id += 1;

    let (verdict, _) = misestimate(plan);
    let color = match verdict {
        Misestimate::Under => "tomato",
        Misestimate::Over => "khaki",
        _ => "lightgray",
    };

    out.push_str(&format!(
        "  n{} [label=\"{}\", style=filled, fillcolor={}];\n",
        id, label, color
    ));
    if let Some(p) = parent {
        out.push_str(&format!("  n{} -> n{};\n", p, id));
    }
    for child in &plan.children {
        emit_dot(child, next_id, Some(id), out);
    }
}
Enter fullscreen mode Exit fullscreen mode

Pipe it to dot -Tpng or dot -Tsvg and you get a real graph image with the misestimated nodes already highlighted.

Tradeoffs and what's deliberately missing

  • JSON only. Not text, xml, or yaml. JSON is the only EXPLAIN format Postgres ships with a stable, machine-friendly shape. The text format changes between versions and the regex parsers people have written for it are infamously fragile. If you care about correctness, ask Postgres for JSON.
  • No (BUFFERS) visualization. Buffer hit/read counts are present in the JSON when you ask for EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON), but I haven't surfaced them yet. They're useful for "is this query I/O-bound vs CPU-bound" questions and are on the roadmap.
  • No graphical tree beyond the --format dot escape hatch. A native pixel-perfect tree would mean either pulling in an SVG-rendering crate (huge dep) or shipping a TUI, and dot already exists and is good. So --format dot | dot -Tpng is the answer.
  • No live psql connection. pg-explain-viz reads JSON; it doesn't talk to your database. That's deliberate — it means it has zero dependencies on tokio-postgres, sqlx, or any of the other heavyweight Postgres clients, and it works equally well against EXPLAIN output you got from a saved file, a Slack paste, or a CI artifact.

Testing

45 tests total across unit and integration:

  • Parser (7) — minimal Seq Scan, nested Hash Join, ANALYZE actuals, malformed JSON, missing Plan key, the object-form (no top-level array), Sort Key array → joined string.
  • Analyzer (10) — Unknown when no actuals, Ok within 2x, Under for 50x, Over for 5x, loop multiplication on inner Index Scan (this is the one that catches the loop bug if you forget it), zero-planned-with-actual edge case, slow_nodes empty without analyze data, slow_nodes ranked descending including loop-multiplied total, path includes ancestors, count_nodes / depth.
  • Renderer (9) — tree glyphs (├── / └──), no-color stripping, color path emits ANSI, ANALYZE adds [actual=...] annotations and misestimate hints, slow-node summary appears, JSON round-trips through serde, JSON includes the misestimate field, dot is well-formed (correct node count, correct edge count, balanced braces), dot escapes embedded quotes.
  • CLI (3) — defaults to text format, parses --format dot, parses --no-color.
  • Color (3) — disabled palette emits empty strings, enabled palette emits escapes, --no-color overrides tty detection.
  • Integration (12) — three real-shaped fixtures (explain.json, explain-analyze.json, misestimate.json) parse cleanly; misestimate fixture flags the Nested Loop's inner Index Scan as Under with ratio > 100; text render with no-color emits zero escape bytes; text render with color emits escape bytes; JSON render round-trips via serde_json::from_str; dot render is valid Graphviz; and the actual binary is spawned via CARGO_BIN_EXE_pg-explain-viz to verify exit code 0 against good input, exit code 1 against malformed JSON, exit code 2 against a missing file, and tomato fillcolor in the dot output for the misestimate fixture.

The integration test that spawns the real binary is the one I trust most. Unit tests can disagree with reality in subtle ways (#[cfg(test)] features, mocked file handles); spawning your own binary against a committed fixture cannot.

Try it in 30 seconds

git clone https://github.com/sen-ltd/pg-explain-viz
cd pg-explain-viz
docker build -t pg-explain-viz .
docker run --rm -v $(pwd)/tests/fixtures:/work pg-explain-viz /work/misestimate.json
Enter fullscreen mode Exit fullscreen mode

Or against your own database:

psql -At -c "EXPLAIN (ANALYZE, FORMAT JSON) SELECT ..." \
  | docker run --rm -i pg-explain-viz -
Enter fullscreen mode Exit fullscreen mode

Closing

Entry #174 in a 100+ portfolio series by SEN LLC. The repo is at github.com/sen-ltd/pg-explain-viz — feedback welcome, especially from anybody whose EXPLAIN ANALYZE reading habits I haven't accounted for.

Top comments (0)