The recoverability gap is the structural distance between what your recovery plan assumes and what your infrastructure can actually deliver under failure conditions. Most organizations don't discover it during planning. They discover it on day three of an incident.
The ransomware tabletop passed.
The backup dashboard was green.
The recovery runbook was approved.
Day three of the incident is when the team discovered Active Directory was part of the blast radius.
Day four is when they discovered the recovery runbook assumed Active Directory was available.
Day zero was the ransomware event. Day three was when the recoverability gap became visible.
What the Recoverability Gap Actually Is
Most organizations invest in recovery planning and assume recoverability follows. The recoverability gap is the distance between those two things.
Recoverability is not the same thing as recoverability planning. A recovery plan describes what you intend to do. Recoverability describes what the architecture is capable of doing when the environment it was written for no longer exists. These are different properties, and they require different disciplines to produce.
Framework #148 defines the Recoverability Gap as the structural mismatch between recovery plan assumptions and recovery architecture capability — specifically across three survivability layers: data, execution, and authority.
The gap is not documentary. You cannot close it by improving the runbook. It is a design property. The decisions that determine recoverability are made at build time, not during the incident.
Framework #148 — Recoverability Formula
Recoverability = Data Survivability + Execution Survivability + Authority Survivability
Recoverability is only as strong as its weakest survivability layer. Failure in any one reduces the result to zero. Most recovery programs measure only the first variable.
This is a dependency chain, not a maturity score. An organization with immutable backups, a pre-provisioned recovery environment, and no pre-authorized recovery authority has a recoverability score of zero. The chain breaks at authority, and the other two investments become irrelevant.
The Signals That Make Organizations Think They're Recoverable
The problem with the recoverability gap is that the signals organizations use to assess recovery readiness are all valid control indicators — and none of them prove recoverability.
Ordered from weakest to strongest evidence:
Recovery plan approved. Documents exist. Intent is recorded. This proves nothing about the environment that plan will execute against.
Immutable backups enabled. Data survivability is partially addressed. The backup can survive the event. Whether the backup is reachable from the recovery environment is a separate question.
Cross-region replication active. Data exists in a second location. Cross-region replication is not resilience — replication survivability assumes the replication infrastructure itself was not in scope during the attack — an assumption that is frequently wrong.
Annual tabletop completed. Procedures were walked through in a controlled environment. Most disaster recovery tests don't test recovery — tabletops test decision-making and communication, not execution against a degraded infrastructure.
Recovery test passed (against a clean, isolated environment that didn't reflect production blast radius). This is the most dangerous signal. A successful recovery test against a clean environment proves that the recovery procedure is correct in theory. It does not prove that the execution environment will be available when needed. Testing is not the same as proving recoverability.
None of these prove recoverability. They prove specific controls exist.
⚠ Common Mistake: A recovery test run against a clean, isolated environment confirms that the procedure works under ideal conditions. It does not confirm that the execution environment will be available when the incident occurs. These are different tests, and most DR programs run only the first one.
Why Ransomware Makes the Gap Visible
The recoverability gap exists in every recovery program that hasn't been designed explicitly against adversarial conditions. Ransomware doesn't create it. It reveals it under time pressure, in a degraded environment, with the clock running.
Modern ransomware attacks are not backup attacks. They are recovery path attacks. The objective is to make recovery as expensive and slow as possible — which means targeting the infrastructure that recovery depends on: backup management consoles, identity systems, management networks, and administrative tooling. These are the same systems your recovery runbook assumes are available.
This is why backup blast radius framing matters. Your backup system is part of the blast radius not because the attacker necessarily targets it first, but because the backup management plane sits on the same identity and network infrastructure that is in scope during the attack. When AD is compromised, backup consoles that authenticate against AD are compromised. The backup data may survive. The tooling to orchestrate recovery may not. Most backup isolation architectures fail for exactly this reason — the air gap is connected to the same identity and management plane it was supposed to be isolated from.
The second mechanism is RTO calculation failure. Ransomware recovery time is an architecture problem, not a backup problem — recovery time objectives are calculated against a known starting state: a functioning identity platform, available management tooling, reachable infrastructure. When the incident environment doesn't match the assumed state, the RTO calculation becomes meaningless. The estimate was accurate. The assumptions weren't.
Three Forms of the Recoverability Gap
01 — Backup Gap
The data doesn't exist, isn't current, or can't be accessed from outside the failure domain. This is the expected gap — most DR programs are designed to close it. Immutable storage and air-gapped vaults address the data survivability layer. Most organizations close the backup gap and believe they are recoverable.
02 — Execution Gap
The data exists. Recovery cannot execute. The management plane is encrypted. Active Directory is compromised. Bastion hosts are down. The runbook is procedurally correct, but it was written for an execution environment that no longer exists. This gap is unexpected — and it is the gap that produces the day-four discovery in the opening scenario.
03 — Authority Gap
The data exists. The recovery environment exists. Nobody has pre-authorized the decision to activate it. Who can declare DR? Who has authority to approve out-of-band spend? Who can execute recovery without the normal approval chain when the normal approval chain is unavailable? This is the most dangerous gap because the other two investments become irrelevant until it is resolved.
The progression matters. Organizations that close only the backup gap feel recoverable. Organizations that close the backup gap and execution gap feel confident. The authority gap is the one that stops recovery after everything else is in place.
Closing the Recoverability Gap Requires Design Decisions, Not Better Plans
The recoverability gap cannot be closed at incident time. The decisions that determine recoverability are made during architecture and build — before the event, against the assumption that the event will be adversarial.
Survivable Data
Can the backup survive the failure domain? Survivable data means the backup exists in a location that is architecturally isolated from the primary environment — not just geographically separate, but network-isolated, auth-isolated, and admin-isolated. Immutable storage closes part of this — but object lock alone isn't enough. Out-of-band access that does not depend on the primary identity platform closes the rest. Cross-region replication is not survivable data if the replication management plane is in scope during the attack.
Survivable Execution
Can recovery execute from an environment that assumes the primary infrastructure is fully adversarial? Survivable execution means the execution environment is pre-provisioned — not provisioned during the incident. Jump hosts with out-of-band access. Identity that does not depend on production AD. Management tooling that can reach backup infrastructure without traversing the production network. You cannot provision your way to execution survivability under time pressure after the incident starts.
Survivable Authority
Is the recovery decision pre-authorized? Survivable authority means the decision to declare DR, activate alternate infrastructure, and approve out-of-band spend has been made in advance — documented, signed, and accessible without the normal approval chain. The D3 Cyber Vault Architecture stage builds the isolation foundation that survivable execution depends on.
Diagnostic: "If your primary identity platform, backup management console, and production network were unavailable simultaneously, what recovery actions could your team execute within the first four hours?"
Most architects can answer this question in theory. The gap is whether the answer is operational.
Architect's Verdict
Recovery plans that have never been stress-tested against an adversarial execution environment aren't recovery plans — they're aspirations.
The recoverability gap is not revealed by auditing the plan. It is revealed by auditing the architecture against the assumption that the execution environment will be degraded, the identity platform will be unavailable, and the clock will be running. Most recovery programs are designed against the absence of failure. Adversarial recovery requires designing against the presence of it.
Ransomware doesn't reveal whether backups exist. It reveals whether recoverability was ever designed into the architecture.
Originally published at rack2cloud.com




Top comments (0)