nishikawaakira for AWS Community Builders

Posted on May 21

What did our VPC look like at 22:14 (Tue, 2026-05-20) — building clew, a CLI for navigating AWS Config snapshots

#aws

It's 2am. Production is on fire. Someone in the war room asks, "what did the VPC actually look like at 22:14, when this started?", and the only honest answer is, "give me twenty minutes with jq."

That moment, repeated enough times, is why I started writing the tool I want to talk about today: clew, a Go CLI that ingests AWS Config Configuration Snapshots into a local DuckDB and renders an interactive topology you can actually read.

▎ https://github.com/nishikawaakira/clew

This post is the story of why it exists and how to use it during an incident.

The problem: "the moment-in-time configuration" is a real artifact you can't easily see

A scenario you might recognize:

It's 22:14. Targets behind an ALB start going UnHealthy, but only some of them. CloudWatch shows the symptom but not the cause. People in the war room start floating theories:

"Maybe someone tightened a security group rule."
"Maybe a route table was modified and Private Subnet traffic is now going out the wrong gateway."
"Maybe a subnet's route table association silently changed."

All plausible. But each of them needs the same piece of evidence: what the topology actually looked like at 22:14, ideally next to what it looked like at 18:00 when things were still fine. The AWS Console only shows you "right now," which is precisely the moment you don't care about anymore.

The good news: AWS Config has been quietly dropping a full Configuration Snapshot of every tracked resource into S3 every six hours. The artifact exists. The bad news is that the artifact ships as:

A <account>_Config_<region>_ConfigSnapshot_<timestamp>_<snapshot-id>.json.gz file
Containing a flat configurationItems array, mixed across resource types
Where "this EC2 uses this security group" lives under configuration.networkInterfaces[].groups[].groupId
And "this VPC contains these subnets, which contain these instances" is purely implicit — you reconstruct it in your head from the relationships[] field

jq at 2am is a poor incident-response interface. And drawing the VPC topology on paper while pagers are going off is worse.

Existing tooling, and where it fell short for me

I'm not pretending nothing else exists. The usual suspects:

AWS Config Aggregator + Advanced Queries (SQL-ish querying from the console)
CloudFormation StackSets with drift detection
Commercial CMDB / IaC visualizers (Lucidchart's AWS importer, Hava, etc.)
Rolling your own jq library and committing it to the team's snippet repo

What I couldn't find was something that hit all of:

Works against an arbitrary historical snapshot (any S3 object, not just "now")
Local-first, no SaaS round trip — incident artifacts shouldn't leave the responder's laptop
Produces a single self-contained HTML file I can paste into a Slack thread or a postmortem doc
Lets me stack multiple snapshots in the same store and compare across time

So I started writing what I wished existed.

What clew does

Three commands, one data model:

The DuckDB file holds three tables:

`config_items`

Stores one row per imported configurationItem, including the original configuration, relationships, tags, and capture_time.

`graph_nodes`

Stores one row per resource, including placeholder rows for resources referenced by relationships but not yet imported.

`graph_edges`

Stores resource-to-resource edges extracted from both relationships[] and the configuration body.

The default render --format html output is the part most worth showing off. It produces an interactive topology with compound nodes — VPCs are drawn as outer boxes containing subnet boxes containing EC2 / ENI nodes — the same nesting you'd draw on a whiteboard during incident response, except generated from the snapshot you just imported.

The generated HTML is interactive: you can pan and zoom, click nodes to inspect their type, resource ID, ARN, and placeholder status, toggle edge labels, switch layout direction, and choose between orthogonal and bezier edges.

The file is fully self-contained — copy it anywhere, open it in any browser.

Walkthrough: using clew in incident-response mode

Step 0: Install

# Requires Go 1.24+ and a C compiler — go-duckdb is CGO-bound.
go install github.com/nishikawaakira/clew@v0.1.0

This assumes AWS Config is already enabled and delivering configuration snapshots to S3.

Step 1: Pull the snapshot(s) closest to the incident from S3

AWS Config writes keys with unpadded month/day (.../2026/5/20/...), so dancing around BSD-vs-GNU date is a waste of energy. Listing recursively and filtering by name is the safer path:

ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
REGION=<incident-region>
BUCKET=<your-config-bucket>

aws s3 ls "s3://${BUCKET}/AWSLogs/${ACCOUNT_ID}/Config/${REGION}/" --recursive | grep ConfigSnapshot | sort

# Grab the two snapshots straddling the incident window and `aws s3 cp` them down.

If you need one right now and don't want to wait for the next scheduled delivery:

aws configservice deliver-config-snapshot --delivery-channel-name default

In my experience, the S3 object usually shows up within 10–60 seconds.

Step 2: Import everything into a single DuckDB

clew import --input snapshot-2026-05-20T22-00.json.gz --db incident.duckdb
clew import --input snapshot-2026-05-20T18-00.json.gz --db incident.duckdb

You can keep stacking snapshots into the same incident.duckdb. config_items.item_id is a PRIMARY KEY of account:region:type:resource_id:captureTime, combined with ON CONFLICT (item_id) DO NOTHING on insert.

The practical guarantees:

Re-importing the same snapshot is a true no-op (the inserts collide on item_id).
Re-importing a snapshot taken at a different capture time keeps both rows for the same resource — history is preserved, not overwritten.

graph_nodes / graph_edges always reflect the latest state, so the rendered topology is always "the most recent snapshot you imported," but the historical detail still lives in config_items.

Step 3: Render the topology

clew render --db incident.duckdb --view vpc --format html --output vpc.html
open vpc.html

If you want it for a Slack thread:

clew render --db incident.duckdb --view vpc --format png --output vpc.png

The raster output is fine for a quick share, but it doesn't scale well to large topologies — once you have more than a couple of dozen resources, the nodes get cramped and labels become hard to read. For anything larger, the interactive HTML is the better artifact to share with the team, because reviewers can open it locally and pan/zoom.

If you want to embed it in a postmortem Markdown:

clew render --db incident.duckdb --view vpc --format mermaid --output vpc.md

Step 4: Zoom in on a suspect

When you have a specific theory — "the ALB stopped reaching this instance" — the global view is too much. query runs a bidirectional BFS (breadth-first search) from one resource:

# Two hops out from a specific instance, as interactive HTML.
clew query --db incident.duckdb --resource-id i-0abc... --depth 2 \
 --format html --output i-0abc.html

# Or a text summary in the terminal when you just want to confirm a hunch.
clew query --db incident.duckdb --resource-id sg-12345 --depth 1 --format text

The text format is intentionally boring and grep-friendly:

Nodes:
  - AWS::EC2::Instance/i-0abc...
  - AWS::EC2::SecurityGroup/sg-12345
  - AWS::EC2::Subnet/subnet-pub
  ...
Edges:
  - <node-id> ==[Uses security group]==> <node-id>
  ...

Step 5: Diff across time, with plain SQL

Because every snapshot lands as its own row in config_items, you can ask DuckDB directly:

duckdb incident.duckdb

SELECT capture_time, configuration_json
FROM config_items
WHERE resource_id = 'sg-12345'
ORDER BY capture_time;

Comparing the 22:00 vs. 22:14 configuration_json of a single security group is now a diff between two rows. The whole point of putting the data in a real database was to make the ad-hoc questions stop requiring custom code.

What clew doesn't do yet

Being honest about scope:

Only the vpc view is implemented. The internal model is type-agnostic, so adding iam, org, kms, or network (peering/TGW) views is a matter of writing the type list and the configuration-edge extractors, not a redesign.
The HTML output pulls Cytoscape.js / dagre / cytoscape-dagre from a CDN at view time. Useful in 95% of cases, useless on a fully air-gapped laptop. An --embed-js mode using go:embed is the obvious follow-up.
No special tuning for very large snapshots. Inserts are straightforward prepared statements. For tens of thousands of resources you'd want the DuckDB Appender API or parallel ingest.
Cross-account / cross-region edges only exist if AWS Config explicitly listed them. Reconstructing organization-wide topology needs more wiring (likely an Organizations + Config Aggregator integration).

Why "clew"?

clew is the Old English word for "ball of thread" — and the literal etymological root of the modern word clue. In Greek mythology it's the thread Ariadne gave Theseus so he could find his way back out of the Labyrinth.

Production AWS environments tend to drift toward labyrinth shape, especially after years of "just one more security group / one more subnet / one more peering connection." Most of the time it's fine. Then 2am happens, and the same environment that felt familiar yesterday looks like a maze.

clew is named for the thread you wish you had on the way in.

Try it

go install github.com/nishikawaakira/clew@v0.1.0
clew --help

Repository: https://github.com/nishikawaakira/clew (MIT)

If you want to see the output without setting up real AWS Config, the repo ships a testdata/sample_snapshot.json that exercises every piece of the rendering pipeline:

clew import --input testdata/sample_snapshot.json --db /tmp/demo.duckdb
clew render --db /tmp/demo.duckdb --view vpc --format html --output /tmp/demo.html
open /tmp/demo.html

This is still a PoC — there are sharp edges and obvious gaps. Issues, PRs and "here's the view I actually need" feature requests are all welcome.

Thanks for reading. If this means one fewer engineer is writing jq queries at 2am next month, the project has paid for itself.

DEV Community