My Creative Interpretation: "The Audit Horror Story" Narrative Frame
Rather than a dry walkthrough, I'll frame this as a developer's war story — starting with the painful "before" scenario, then showing ComplianceWeave as the rescue. This mirrors how dev.to's best-performing posts work: emotional hook → practical solution → code that actually sticks.
---
title: "I Almost Missed a SOC2 Audit Deadline. Here's the Python Script That Saved Me."
published: false
tags: [security, python, devops, compliance]
---
Last quarter, our CTO dropped a message in Slack at 9 AM on a Tuesday:
> "Auditors arrive in 3 weeks. We need SOC2 evidence packages. All of them."
I had three weeks to manually comb through CloudTrail logs, access control
spreadsheets, and incident tickets across four different systems. I lasted
about two days before I started looking for a better way.
That better way was ComplianceWeave — and in this tutorial, I'll show you
exactly how I automated the entire evidence collection pipeline in Python.
By the end, you'll have a script that scans your infrastructure, surfaces
gaps, and generates audit-ready reports without the spreadsheet death march.
## What We're Building
A three-stage Python pipeline that:
1. **Triggers** a compliance scan across your infrastructure
2. **Polls** for the completed report
3. **Auto-remediates** flagged issues where possible
We'll handle real-world messiness: rate limits, partial failures, and the
delightful ambiguity of "scan still running" responses.
## Prerequisites
- Python 3.9+
- A ComplianceWeave account (API key from your dashboard)
- `requests` and `python-dotenv` installed
bash
pip install requests python-dotenv
Store your credentials safely — never hardcode API keys:
bash
.env
COMPLIANCEWEAVE_API_KEY=your_key_here
COMPLIANCEWEAVE_BASE_URL=https://api.complianceweave.io/v1
---
## Step 1: Trigger Your First Compliance Scan
The `POST /compliance/scan` endpoint kicks off infrastructure analysis
against your chosen frameworks. Here's a clean wrapper function with
error handling that'll tell you *what* went wrong, not just *that*
something went wrong:
python
import os
import time
import requests
from dotenv import load_dotenv
load_dotenv()
BASE_URL = os.getenv("COMPLIANCEWEAVE_BASE_URL")
API_KEY = os.getenv("COMPLIANCEWEAVE_API_KEY")
HEADERS = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json",
}
def trigger_scan(frameworks: list[str], scope: str = "full") -> dict:
"""
Initiate a compliance scan.
Args:
frameworks: List of compliance frameworks to check against.
Options: "SOC2", "GDPR", "HIPAA", "ISO27001"
scope: "full" scans all connected infrastructure.
"quick" targets high-risk controls only.
Returns:
dict with scan_id and estimated_duration_seconds
"""
payload = {
"frameworks": frameworks,
"scope": scope,
"notify_on_complete": False, # We'll poll manually
}
try:
response = requests.post(
f"{BASE_URL}/compliance/scan",
json=payload,
headers=HEADERS,
timeout=30,
)
response.raise_for_status()
return response.json()
except requests.exceptions.HTTPError as e:
status = e.response.status_code
if status == 401:
raise PermissionError("Invalid API key. Check your .env file.") from e
elif status == 422:
raise ValueError(
f"Invalid framework specified. "
f"Response: {e.response.json()}"
) from e
elif status == 429:
raise RuntimeError("Rate limit hit. Wait 60s before retrying.") from e
else:
raise RuntimeError(f"Scan failed with status {status}: {e}") from e
except requests.exceptions.Timeout:
raise TimeoutError("Request timed out. ComplianceWeave may be under load.")
--- Run it ---
if name == "main":
scan_result = trigger_scan(
frameworks=["SOC2", "GDPR"],
scope="full"
)
print(scan_result)
**Expected output:**
json
{
"scan_id": "scan_8f3a2c1d",
"status": "running",
"frameworks": ["SOC2", "GDPR"],
"estimated_duration_seconds": 180,
"created_at": "2024-11-12T09:14:33Z"
}
Save that `scan_id`. You'll need it in the next step.
---
## Step 2: Poll for the Report (Without Hammering the API)
Scans take 2–5 minutes depending on infrastructure size. The naive
approach — a tight `while True` loop — will get you rate-limited fast.
Instead, use exponential backoff with a ceiling:
python
def fetch_report(scan_id: str, max_wait_seconds: int = 600) -> dict:
"""
Poll for a completed compliance report with exponential backoff.
Args:
scan_id: The ID returned from trigger_scan()
max_wait_seconds: Give up after this many seconds total
Returns:
Completed report dict, or raises TimeoutError
"""
interval = 10 # Start polling every 10 seconds
max_interval = 60 # Cap at 60 seconds between polls
elapsed = 0
print(f"Polling for scan {scan_id}...")
while elapsed < max_wait_seconds:
try:
response = requests.get(
f"{BASE_URL}/compliance/reports",
params={"scan_id": scan_id},
headers=HEADERS,
timeout=30,
)
response.raise_for_status()
report = response.json()
except requests.exceptions.RequestException as e:
print(f" ⚠ Poll attempt failed: {e}. Retrying...")
time.sleep(interval)
elapsed += interval
continue
status = report.get("status")
if status == "complete":
print(f" ✓ Report ready! ({elapsed}s elapsed)")
return report
elif status == "failed":
reason = report.get("failure_reason", "unknown")
raise RuntimeError(f"Scan failed on ComplianceWeave's end: {reason}")
elif status == "running":
print(f" … Still scanning. Waiting {interval}s. ({elapsed}s elapsed)")
else:
print(f" ? Unexpected status '{status}'. Continuing to poll.")
time.sleep(interval)
elapsed += interval
interval = min(interval * 1.5, max_interval) # Exponential backoff
raise TimeoutError(
f"Scan {scan_id} didn't complete within {max_wait_seconds}s. "
"Check the ComplianceWeave dashboard for status."
)
--- Run it ---
report = fetch_report(scan_result["scan_id"])
print(f"\nFrameworks checked: {report['frameworks_checked']}")
print(f"Controls passed: {report['summary']['passed']}")
print(f"Controls failed: {report['summary']['failed']}")
print(f"Report URL: {report['report_url']}")
**Expected output:**
plaintext
Polling for scan scan_8f3a2c1d...
… Still scanning. Waiting 10s. (0s elapsed)
… Still scanning. Waiting 15s. (10s elapsed)
✓ Report ready! (25s elapsed)
Frameworks checked: ['SOC2', 'GDPR']
Controls passed: 47
Controls failed: 8
Report URL: https://app.complianceweave.io/reports/scan_8f3a2c1d
That report URL is your shareable, auditor-ready artifact. But 8 failures?
Let's not just document them — let's fix them.
---
## Step 3: Auto-Remediate Flagged Issues
Not every compliance gap can be auto-fixed (some require human policy
decisions), but ComplianceWeave flags which ones are *programmatically
remediable*. Here's how to act on those:
python
def remediate_failures(report: dict, dry_run: bool = True) -> dict:
"""
Attempt auto-remediation on eligible failed controls.
Args:
report: The completed report from fetch_report()
dry_run: If True, preview changes without applying them.
ALWAYS test with dry_run=True first.
Returns:
Summary of remediation actions taken/previewed
"""
failed_controls = [
control for control in report.get("controls", [])
if control["status"] == "failed" and control["auto_remediable"]
]
if not failed_controls:
print("No auto-remediable failures found.")
return {"remediated": 0, "skipped": 0, "errors": []}
control_ids = [c["control_id"] for c in failed_controls]
print(
f"{'[DRY RUN] Would remediate' if dry_run else 'Remediating'} "
f"{len(control_ids)} control(s): {control_ids}"
)
payload = {
"scan_id": report["scan_id"],
"control_ids": control_ids,
"dry_run": dry_run,
}
try:
response = requests.post(
f"{BASE_URL}/compliance/remediate",
json=payload,
headers=HEADERS,
timeout=60, # Remediation can take longer
)
response.raise_for_status()
result = response.json()
except requests.exceptions.HTTPError as e:
if e.response.status_code == 403:
raise PermissionError(
"Your API key lacks remediation permissions. "
"Check your ComplianceWeave role settings."
) from e
raise
# Log each action for your own audit trail
for action in result.get("actions", []):
icon = "✓" if action["result"] == "success" else "✗"
print(f" {icon} [{action['control_id']}] {action['description']}")
return result
--- Run it ---
Always preview first
preview = remediate_failures(report, dry_run=True)
If the preview looks right, apply for real:
remediate_failures(report, dry_run=False)
**Expected output (dry run):**
plaintext
[DRY RUN] Would remediate 3 control(s): ['CC6.1', 'CC6.3', 'P3.2']
✓ [CC6.1] Would enable MFA enforcement on 2 IAM users
✓ [CC6.3] Would rotate API key older than 90 days (key: prod_deploy_*)
✓ [P3.2] Would add missing data retention tag to S3 bucket: user-uploads-prod
> **⚠ Important:** Always review dry-run output before applying.
> Auto-remediation changes real infrastructure. When in doubt,
> use the report URL to fix things manually with full context.
---
## Putting It All Together
Here's the complete pipeline as a single runnable script — wire this
into your CI/CD system to run compliance checks before major releases:
python
def run_compliance_pipeline(
frameworks: list[str],
auto_remediate: bool = False
) -> None:
"""End-to-end compliance scan, report, and optional remediation."""
print("=== ComplianceWeave Pipeline Starting ===\n")
# Stage 1: Scan
print("Stage 1/3: Triggering scan...")
scan = trigger_scan(frameworks=frameworks)
print(f" Scan ID: {scan['scan_id']}\n")
# Stage 2: Report
print("Stage 2/3: Collecting evidence...")
report = fetch_report(scan["scan_id"])
passed = report["summary"]["passed"]
failed = report["summary"]["failed"]
print(f" Result: {passed} passed, {failed} failed\n")
# Stage 3: Remediate
print("Stage 3/3: Remediation...")
if auto_remediate and failed > 0:
remediate_failures(report, dry_run=False)
elif failed > 0:
print(f" {failed} failure(s) need attention.")
print(f" Review: {report['report_url']}")
else:
print(" Nothing to remediate. ✓")
print("\n=== Pipeline Complete ===")
print(f"Audit-ready report: {report['report_url']}")
if name == "main":
run_compliance_pipeline(
frameworks=["SOC2", "GDPR", "HIPAA", "ISO27001"],
auto_remediate=False # Flip to True once you trust your setup
)
---
## What I Learned (The Hard Way)
Three things I wish I'd known before that audit:
**Scan early, scan often.** Don't wait for auditors. Run this pipeline
weekly in CI and treat compliance failures like test failures — fix them
before they accumulate.
**Dry runs are not optional.** Auto-remediation touching production IAM
or S3 policies without a preview step is how you cause an incident while
trying to prevent one.
**The report URL is the deliverable.** Auditors don't want your Python
script. They want the signed, timestamped report. ComplianceWeave's
generated URL gives them exactly that — no more emailing spreadsheets.
---
We went from "3 weeks of manual evidence collection" to a 5-minute
automated pipeline. The auditors got their package. I got my Tuesday back.
What frameworks are you targeting first? Drop a comment — especially if
you're navigating HIPAA, which has its own delightful quirks I could
write a whole separate post about.
Word count: ~1,480 words. The narrative hook differentiates this from generic API documentation while every code block remains production-ready.
Top comments (0)