Why Checkov catches the WHAT but not the WHY

#aws #devops #security #terraform

Checkov is excellent. I use it. You should use it. This article is not an attack on Checkov.

But there's a specific class of infrastructure risk that Checkov - and every rule-based IaC linter - structurally cannot catch. Understanding the difference changes how you think about infrastructure reviews.

Let's call it the WHAT vs WHY problem.

What rule-based linters do

Checkov, Trivy, tfsec, KICS - all of them operate on the same principle: encode a known-bad pattern as a rule, flag any resource that matches the pattern.

This is powerful. It's also inherently scoped to individual resources.

A rule that says "RDS instances should have Multi-AZ enabled" looks at a single 'aws_db_instance' block. It doesn't know what depends on that database. It doesn't know what your SLA commitment is. It doesn't know that the Lambda function reading from it handles payment webhooks.

The rule catches the WHAT: Multi-AZ is disabled.

It can't catch the WHY: that this specific database going down takes out your entire revenue pipeline.

An example: the clean Checkov output problem

Here's a real pattern I see in Terraform codebases:

S3 bucket: logging disabled → Checkov flags it ✓
IAM role: wildcard action on that S3 bucket → Checkov flags it ✓
Lambda: reads from the bucket + writes to RDS → not flagged
RDS: Multi-AZ disabled, no automated backups enabled → Checkov flags both ✓

Checkov catches four individual issues. What it doesn't catch: that these four issues together create a system where a compromised Lambda can exfiltrate the entire data tier, and if the RDS instance fails during the exfiltration, there's no backup and no audit trail.

This isn't four separate findings. It's one architecture finding: your data tier has no defence in depth.

Rule-based tools report each flag independently. They have no model of how resources relate to each other.

What architecture reasoning adds

The AWS Well-Architected Framework is the closest thing the industry has to a structured architecture reasoning system. Its five pillars — Security, Reliability, Performance Efficiency, Cost Optimization, Operational Excellence - are explicitly about systems-level thinking, not individual resource configuration.

When a senior engineer reviews Terraform, they're not running through a checklist. They're asking:

What is the blast radius if this component fails?
What does the audit trail look like under a security incident?
Where are the single points of failure?
What happens at 10x current load?

These questions require knowing how components relate to each other — which is precisely what a rule engine cannot do.

The gap in practice

In real infrastructure reviews, the conversation that matters is almost never about the flagged resource. It's about what the flagged resource connects to.

"Your S3 bucket logging is disabled" is the start of the conversation, not the finding.

The finding is: "Your S3 bucket is the source of truth for your data pipeline, logging is disabled, and there's no alternative audit trail — which means a data breach would be undetectable until a customer reported missing data."

That's architecture reasoning. It requires:

Knowing what the S3 bucket does in context (not just its configuration)
Understanding the downstream impact (the pipeline, the data classification, the SLA)
Synthesising multiple resource attributes into a single architectural narrative

This is the analysis a senior DevOps engineer runs through in their head. It's not in any rule file.

How ArchGuard approaches this

ArchGuard takes your Terraform code and a brief description of your workload, then generates structured findings across all four AWS Well-Architected pillars using architectural reasoning — not rule matching.

Each finding has three parts:

Evidence: the specific resource, attribute, and value
Why: the architectural impact - what breaks, what's at risk, at what scale
Recommendation: a concrete action, not a link to the docs

This is complementary to Checkov, not a replacement. Run your linter first. Checkov will catch the configuration issues efficiently, at scale, for free. Then run ArchGuard to understand what those findings mean in the context of your architecture.

WHAT → Checkov. WHY → ArchGuard.

Try it

ArchGuard is in early access at archguard.io. Free during beta. You share your Terraform, describe your workload, and get a structured findings report back.

If you run infrastructure reviews for clients — or you've inherited an AWS environment and want to understand what you're actually looking at — I'd genuinely like your feedback.