Why your vulnerability dashboard is lying to you (and how to fix it)

#security #devsecops #aws #python

You open your vulnerability dashboard on a Monday morning and see 47 critical CVEs
across 12 assets. By Thursday, your team has patched 11 of the 12 assets. But the
dashboard still shows 40 criticals. What happened?

The assets were patched. The dashboard doesn't know that, because the vulnerability
scanner sees a different record than the asset your team was tracking. The same
physical server exists in your tools as:

Tool	Identifier
AWS	`i-0a1b2c3d4e5f`
CrowdStrike	`prod-api-07.internal`
Tenable	`10.0.4.22` (scan-time IP)
Qualys	`10.0.4.23` (different scan window, NATted)

When Tenable reports the CVE patched on 10.0.4.22, your dashboard doesn't
automatically know that 10.0.4.22 is the same machine as prod-api-07.internal.
So it still shows the finding as open on the CrowdStrike record.

This is the asset identity problem. Most security teams have it. Almost nobody
talks about it.

The standard approaches — and why they fall short

"We use the hostname" — Hostnames are normalized differently by every tool.
Tenable might see prod-api-07, CrowdStrike sees prod-api-07.internal,
ServiceNow has PRODAPI007 from a manual entry made 8 months ago.

"We use the IP address" — IPs change. NAT means the scanner sees a different
IP than the one the EDR agent reports. A host that was 10.0.4.22 last week might
be 10.0.4.31 today.

"We have a CMDB" — Great, how fresh is it? Most CMDBs are 30–60% stale within
6 months of implementation. And you still need to write the correlation logic to
feed it.

A layered matching approach

The core insight is that no single identifier is reliable across tools, but
combining multiple identifiers with explicit confidence scoring gets you very far.

Here's the priority order:

Layer 1 — Hard IDs (confidence: 0.95–1.0)

Match on instanceId, EDR agentId, or MAC address. These are tool-native stable
identifiers. If two records share a hard ID, they're the same asset with near-certainty.

Layer 2 — Hostname (confidence: 0.45–0.85)

Normalize first: strip .local, .internal, case-fold, drop -prod/-dev
suffixes. Then match. Confidence scales with how unique the hostname looks.

Layer 3 — IP address (confidence: 0.60–0.75)

Public IPs get higher confidence than private IPs. Apply a staleness decay: an IP
seen 30 days ago is worth less than one seen yesterday. Private IPs in NAT-heavy
environments are unreliable and scored conservatively.

Layer 4 — Metadata (confidence: up to 0.50)

OS family + cloud region + account ID. Useful as a tie-breaker. Not enough alone.

Combine layers 2 and 3: 0.60 × hostname_score + 0.40 × ip_score. Merge if the
composite score is ≥ 0.70. Flag for human review if 0.50–0.69. Create a new
canonical record if < 0.50.

The key design principle: ambiguous matches are never silently merged. A 50%
confident merge creates ghost duplicates that are worse than no merge at all.

Building the canonical record

Once you've matched records, you merge them. But "merge" has a lot of edge cases:

Which hostname wins when AWS says prod-api-07 and the EDR says prod-api-07.internal? Answer: EDR is more authoritative for hostnames; AWS is more authoritative for region.
What about IP addresses? Union them — an asset can have both a private and public IP.
What if two sources report different OS names? Log the conflict with both values, both sources, and the resolution taken.

Every field disagreement should be logged with full lineage. Conflicts are data.

The open-source implementation

I've been writing this glue layer at multiple companies. Last week I open-sourced it.


bash
pip install security-asset-correlator
https://github.com/apurvtyagi/security-asset-correlator

DEV Community

Why your vulnerability dashboard is lying to you (and how to fix it)

The standard approaches — and why they fall short

A layered matching approach

Building the canonical record

The open-source implementation

Top comments (0)