DEV Community

SEN LLC
SEN LLC

Posted on

Read composer.lock Directly: A 1000-Line CLI That Beats composer show --tree

Read composer.lock Directly: A 1000-Line CLI That Beats composer show --tree

You inherit a PHP project with 80 packages in composer.lock and no idea why any of them are there. composer show --tree helps, but its output is hard to paste, hard to filter, and hard to put in a PR. I wanted something I could pipe — tree for humans, DOT for SVGs, Mermaid for GitHub Markdown. Turns out composer.lock is a documented JSON file, and a small CLI that reads it directly is about 1000 lines of PHP stdlib.

📦 GitHub: https://github.com/sen-ltd/composer-graph

composer-graph output

The problem

Every PHP project past a certain age has a composer.lock with dozens of transitive dependencies that nobody on the current team explicitly added. You see psr/log and symfony/polyfill-mbstring and ralouphie/getallheaders and wonder: who actually requires these? Can I upgrade them? Can I drop them?

The built-in answer is composer show --tree:

laravel/framework v10.48.29
├── brick/math ^0.9.3|^0.10.2|^0.11
│   └── php ^7.4 || ^8.0
├── doctrine/inflector ^2.0
...
Enter fullscreen mode Exit fullscreen mode

This is fine for reading on a terminal. It's not fine for:

  • PR comments, because GitHub doesn't render ASCII trees; they come out as a wall of dashes.
  • Architecture docs, because you can't drop a pretty graph into a confluence page.
  • Selective audits, like "show me every package that depends on psr/log."
  • CI pipelines, because the output format is assumed interactive.

The other answer is installing a big IDE-integrated tool. But the whole point of lockfile analysis is that it's deterministic — nothing should need to boot a service container, install plugins, or even run composer install on the target machine.

composer.lock is a documented JSON format. The data we want is already there. So let's write the small thing.

Design

The design has four parts, all pure:

  1. LockfileParserstring → ParsedLockfile. Pure JSON decode, no Composer SDK dependency.
  2. Graph — wraps the parsed lockfile and exposes edgesFrom(), roots(), walk(), and dependents(). Handles replace, cycle detection, and platform requirements (php, ext-*, lib-*) as leaf nodes.
  3. FormattersTreeFormatter, DotFormatter, MermaidFormatter. Each takes the graph + a list of roots and emits a string. No shared state.
  4. Cli — argv → exit code. Calls parser → graph → formatter and writes to a callable stream (easy to test).

The whole thing runs on PHP 8.2 stdlib. The only dev dependency is PHPUnit. The bin script ships with a 30-line manual autoloader so it works on targets that don't have vendor/ at all.

Part 1: Reading the lockfile

Here's the heart of the parser. The schema is small: packages, packages-dev, platform, platform-dev. Each package has name, version, require, require-dev, replace, and provide.

public static function parseString(string $json): ParsedLockfile
{
    try {
        $data = json_decode($json, true, flags: JSON_THROW_ON_ERROR);
    } catch (\JsonException $e) {
        throw new \RuntimeException('invalid composer.lock JSON: ' . $e->getMessage(), 0, $e);
    }

    $packages = [];
    foreach (self::extractPackages($data, 'packages') as $pkg) {
        $pkg->isDev = false;
        $packages[$pkg->name] = $pkg;
    }
    foreach (self::extractPackages($data, 'packages-dev') as $pkg) {
        if (!isset($packages[$pkg->name])) {
            $pkg->isDev = true;
            $packages[$pkg->name] = $pkg;
        }
    }

    return new ParsedLockfile(
        $packages,
        self::stringMap($data['platform'] ?? []),
        self::stringMap($data['platform-dev'] ?? []),
    );
}
Enter fullscreen mode Exit fullscreen mode

The interesting subtlety is the dev flag. composer.lock keeps packages and packages-dev as two separate lists, but in practice a package can only appear in one. Composer guarantees disjointness when resolving. But a defensive parser shouldn't trust that — if the same name shows up in both lists (e.g. a corrupted lockfile), the runtime-scoped one should win, because that's what actually gets autoloaded in production.

That's the entire parse. No recursion, no dep resolution, nothing Composer-specific. We're just lifting JSON into typed PHP objects.

Part 2: The graph and the replace problem

Here's where composer's data model bites.

A package can declare "replace": { "psr/log-implementation": "^1.0" }. This means: if some other package requires psr/log-implementation, that requirement is satisfied by this package instead. It's how monolog/monolog tells Composer "yes, I implement PSR-3, no need to install psr/log separately."

From a graph perspective, replace is a form of indirect edge. If package A requires Y and package X replaces Y, then A's "real" edge points at X, not Y. And because cycles through replace can occur (though rarely), your walk needs cycle detection.

Here's the resolver:

/**
 * Resolve a requested dependency name to the actual package node.
 * Returns null if the dep is not present in the graph.
 */
public function resolve(string $name): ?string
{
    if (isset($this->byName[$name])) {
        return $name;
    }
    if (isset($this->replacesIndex[$name])) {
        return $this->replacesIndex[$name];
    }
    if ($this->includePlatform && $this->isPlatformName($name)) {
        return $name;
    }
    return null;
}
Enter fullscreen mode Exit fullscreen mode

The rule of precedence: a concrete package always wins over a replaces-entry. If psr/log exists as its own package in the lockfile and something declares "replace": { "psr/log": "^1.0" }, we render the edge to the concrete package. This matches how Composer itself resolves at install time.

And here's the cycle-safe walk:

private function dfs(
    string $name,
    int $depth,
    array $path,
    array &$visited,
    callable $visit,
    int $maxDepth,
): void {
    if (in_array($name, $path, true)) {
        // Cycle detected — emit once more as a cycle marker then stop.
        $visit($name, $depth, $path);
        return;
    }
    $visit($name, $depth, $path);
    $visited[$name] = true;
    if ($depth >= $maxDepth) {
        return;
    }
    $path[] = $name;
    foreach ($this->edgesFrom($name) as $child) {
        $this->dfs($child, $depth + 1, $path, $visited, $visit, $maxDepth);
    }
}
Enter fullscreen mode Exit fullscreen mode

Two things to notice. First, the cycle check uses $path (the current DFS stack), not $visited. If we used $visited, we'd skip revisiting nodes that legitimately appear in multiple subtrees — which is the common case, not the edge case, because diamond dependencies (A → B → psr/log and A → psr/log) are normal.

Second, when a cycle is detected we still call $visit() one more time so the formatter can render a (cycle) marker in the tree output. The visitor callback receives $path, which lets it see which ancestors caused the cycle.

Part 3: The Mermaid formatter

The format I care about most is Mermaid, because it's the one that unlocks "paste a dep graph into a PR description" and have GitHub render it natively. Mermaid is specifically supported in GitHub-flavored Markdown as of 2022.

public function format(array $roots): string
{
    $visited = [];
    $edges = [];
    foreach ($roots as $root) {
        $this->collect($root, 0, [], $visited, $edges);
    }

    // Stable node id assignment — same input, same output.
    ksort($visited);
    $ids = [];
    $i = 0;
    foreach (array_keys($visited) as $name) {
        $ids[$name] = 'n' . $i++;
    }

    $out = "graph LR\n";
    foreach ($ids as $name => $id) {
        $version = $this->graph->version($name);
        $label = $version !== '' ? $name . '<br/>' . $version : $name;
        $out .= sprintf("  %s[\"%s\"]\n", $id, $this->esc($label));
    }
    sort($edges);
    foreach (array_unique($edges) as $edge) {
        [$from, $to] = explode("\0", $edge, 2);
        $out .= sprintf("  %s --> %s\n", $ids[$from], $ids[$to]);
    }

    // classDef / class blocks for dev + platform coloring
    // ... (elided)
    return $out;
}
Enter fullscreen mode Exit fullscreen mode

There are two subtleties that took me a pass to get right.

Stable IDs. Mermaid nodes need stable opaque IDs (n0, n1, …) because package names contain / and -, which Mermaid's parser doesn't like in node IDs. You can only put the real name in the label. That means you need a deterministic mapping from name → ID, so running the tool twice on the same lockfile produces byte-identical output. I ksort() the visited set before assigning IDs.

Label escaping. Mermaid labels are quote-delimited and parse a small subset of HTML. The <br/> for line breaks is fine, but # inside a label ends the label (Mermaid thinks it's a class ref), and " obviously breaks the quotes. So esc() turns # into &#35; and " into &quot;. Took a package name with @ in it to notice the # case.

Here's what it renders to for the simple fixture:

graph LR
  n0["acme/app-core<br/>1.2.0"]
  n1["acme/http<br/>1.0.5"]
  n2["psr/log<br/>1.1.4"]
  n0 --> n1
  n0 --> n2
  n1 --> n2
Enter fullscreen mode Exit fullscreen mode

That's a real Mermaid block. On dev.to and GitHub this renders as an actual graph, not ASCII. Paste that into a PR and you've explained why upgrading psr/log is load-bearing in two lines.

Tradeoffs

A few things I chose to not do, each of which is a real limitation:

  • Version constraints aren't enforced. The parser reads the require constraint ("psr/log": "^1.0") and the resolved version (1.1.4), but it doesn't compare them. That's Composer's job at install time. The lockfile we're reading already reflects a successful resolution.
  • provide is ignored. The provide key is used by packages that expose a virtual capability (e.g. implementing an interface from psr/*). It's almost always redundant with what we already render through replace, and showing both makes the graph messier. If you need it, the data is on Package::$provide — wire up your own edges.
  • php and ext-* are platform leaves. They don't have their own dep subtrees, so by default we exclude them from the graph. --platform re-enables them as rendered leaf nodes. This matches what you want 90% of the time (you're exploring your userland deps, not the language runtime).
  • No cross-check with composer.json. We don't know which packages are "direct" per the user's intent because that info lives in composer.json, not composer.lock. Instead, we infer roots as "packages with no inbound edges." In practice this works: your project's top-level dependencies are exactly the ones nothing else requires.

The cross-check omission is the one I flip-flopped on. Reading composer.json too would give more accurate root detection — but it would also mean two file I/O paths, two parse errors, and two schemas to validate. The lockfile-only version is tighter and, for every real project I tried it on, produced the same answer anyway.

Try it in 30 seconds

git clone https://github.com/sen-ltd/composer-graph.git
cd composer-graph
docker build -t composer-graph .

# Tree view of your project
docker run --rm -v "$PWD:/project" composer-graph /project

# Mermaid for a PR
docker run --rm -v "$PWD:/project" composer-graph /project --format mermaid

# DOT piped to GraphViz
docker run --rm -v "$PWD:/project" composer-graph /project --format dot | dot -Tsvg > deps.svg

# Who depends on psr/log?
docker run --rm -v "$PWD:/project" composer-graph /project --reverse psr/log

# Dev deps + stats
docker run --rm -v "$PWD:/project" composer-graph /project --dev --stats
Enter fullscreen mode Exit fullscreen mode

The Docker image is 51 MB (alpine + slimmed PHP 8.2 runtime). No Composer needs to be installed on your machine — the tool parses the lockfile directly.

Tests

47 PHPUnit cases. The ones that saved me:

  • testReplaceMechanism — caught the resolver-precedence bug where a replace entry was overriding a concrete package.
  • testWalkHandlesCycles — the acme/a → b → c → a fixture caused an infinite loop on the first implementation because I was using $visited instead of $path for cycle detection.
  • testStableNodeIds — a regression test that the Mermaid formatter produces byte-identical output across runs. Without this, I'd have shipped non-deterministic output and broken CI diffs somewhere.
  • testOnlySubtree — verifies that --only acme/http doesn't accidentally leak the parent (acme/app-core) into the rendered output.

When to reach for this

  • You want a dep graph in a PR or an RFC. Mermaid is the answer.
  • You're auditing what requires a specific package and why. --reverse is the answer.
  • You're trying to drop a heavy transitive dep and need to see what's blocking. --only + --reverse together.
  • You want a one-liner for "how deep is this project's dep tree" in a CI badge. --stats gives you maxDepth.
  • You're writing a migration doc and need an SVG for a diagram. --format dot | dot -Tsvg.

When to not reach for it: if you need runtime Composer information (actual installed version on disk, platform-requirement validation, advisory checking). For those, stick with composer show or composer audit. This tool is strictly static analysis of the lockfile.

Closing

Entry #132 in a 100+ portfolio series by SEN LLC. Previous PHP tool in the series:

  • laravel-audit — 17-check static auditor for Laravel projects, same spirit: small, no runtime dependencies, CI-ready exit codes.

composer.lock is one of the most valuable underused files in a PHP project. Once you treat it as data instead of composer's internal business, a lot of architecture questions become one jq away. Or, in this case, one composer-graph away. Feedback welcome.

Top comments (0)