GrimLabs

Posted on Mar 31

5 Things That Will Fail Your SOC 2 Audit (That Nobody Warns You About)

#soc2 #compliance #security #startup

We passed our SOC 2 Type II audit on the second attempt. The first attempt, we failed. And the things that tripped us up were not the things any blog post had warned us about.

Everyone writes about "implement access controls" and "encrypt data at rest." Those are the obvious ones. Here are the five non-obvious things that almost sank our audit, and that i've since heard from multiple other startups who hit the same walls.

1. Your Audit Logs Don't Prove Anything

I wrote a whole separate post about this, but its worth mentioning here because it was our single biggest failure point.

We had audit logs. We had millions of them in ELK. But when the auditor asked "can you demonstrate that these logs are complete and unmodified," we couldnt.

The auditor's specific concern was CC7.2 from the AICPA Trust Services Criteria: "The entity monitors system components and the operation of those components for anomalies that are indicative of malicious acts, natural disasters, and errors affecting the entity's ability to meet its objectives."

The word "monitors" is key. Its not enough to have logs. You need to demonstrate that you actively monitor them and that they have integrity controls. We had logging but no monitoring, no alerting, and no integrity verification.

What fixed it: We implemented hash-chained audit logging with daily integrity verification checks. We also set up alerts for suspicious patterns (multiple failed logins, privilege escalations, data exports over a threshold). The auditor wanted to see both the technical implementation AND evidence that alerts had been triggered and responded to during the observation period.

2. Your Employee Offboarding Process Has Gaps

This one surprised us. We thought our offboarding was solid. When someone leaves, we disable their Okta account. Done, right?

Nope. The auditor asked for a list of every system the departed employee had access to and evidence that access was revoked in each one. Turns out, disabling Okta covers maybe 60% of access. The other 40%:

Personal API keys they generated (did you revoke those?)
AWS IAM credentials
Database connection strings they might have saved locally
SSH keys on servers
Third-party tools with separate logins (Figma, Notion, etc.)
GitHub deploy keys
Service account credentials they knew about

// Offboarding checklist - what the auditor actually wanted to see
interface OffboardingChecklist {
  employeeId: string;
  terminationDate: Date;
  steps: {
    system: string;
    accessType: string;
    revokedDate: Date | null;
    revokedBy: string | null;
    verified: boolean;
    evidence: string; // screenshot URL, API response, etc.
  }[];
}

// They wanted EVIDENCE for each step
// Not just a checkbox saying "done"
// Actual screenshots or API confirmations showing access was removed

The auditor also checked for timing. Was access revoked on the termination date or two weeks later? If theres a gap between when someone leaves and when their access is revoked, thats a finding.

What fixed it: We built an offboarding automation that queries every integrated system and produces an evidence report. It takes about 2 hours per departing employee instead of the 20 minutes we used to spend.

3. Your Change Management Is "We Review PRs"

We told the auditor our change management process was code review via GitHub pull requests. Every change is reviewed before merging.

The auditor asked: "Can you show me the approval for this specific production deploy on February 14th?"

We could show the PR was reviewed. But we couldnt show that the PR corresponded to the specific deployment, that the deployment was authorized by someone with the right role, or that the production environment matched what was tested in staging.

SOC 2 CC8.1 requires that changes to system components are "authorized, designed, developed, configured, documented, tested, approved, and implemented."

Thats a lot more than "someone clicked Approve on the PR."

# What we added to our CI/CD pipeline
# Deployment manifest that links everything together

deployment:
  # Links to the PR/change request
  change_request: "PR #1234"
  approved_by: "jane@company.com"
  approval_timestamp: "2026-02-14T10:30:00Z"

  # Links to test evidence
  test_results:
    unit_tests: "passing - 847/847"
    integration_tests: "passing - 123/123"
    staging_deploy: "deploy_stg_abc123"
    staging_verification: "QA sign-off by mike@company.com"

  # Production deployment details
  production:
    deployer: "ci-bot (automated)"
    deploy_timestamp: "2026-02-14T14:00:00Z"
    commit_sha: "a1b2c3d4"
    rollback_plan: "Revert commit a1b2c3d4, run db:rollback"

What fixed it: We added deployment manifests to every production deploy that link the PR, approval, test results, and deployment together in a single auditable record. The auditor could now trace any production change back to its authorization.

4. Your Vendor Management Is Nonexistent

The auditor asked: "Which third-party services process or store your customers' data? What are their security certifications? When did you last review their security posture?"

We used about 15 third-party services (AWS, Stripe, SendGrid, Datadog, etc.). We had never formally documented which ones had access to customer data, never verified their SOC 2 reports, and never done a vendor risk assessment.

This falls under CC9.2: "The entity assesses and manages risks associated with vendors and business partners."

// What the auditor expected us to have
interface VendorAssessment {
  vendor: string;
  dataTypes: string[]; // What customer data do they access?
  certifications: string[]; // SOC 2, ISO 27001, etc.
  lastReviewed: Date;
  riskLevel: 'low' | 'medium' | 'high' | 'critical';
  contractHasSecurityTerms: boolean;
  contractHasDataProcessingAgreement: boolean;
  subprocessors: string[]; // Their vendors who touch our data
  incidentNotificationSLA: string;
}

What fixed it: We created a vendor inventory spreadsheet (honestly a Google Sheet works fine for this), collected SOC 2 reports from all critical vendors, and established a quarterly review cadence. Boring but necessary. According to NIST's Cybersecurity Supply Chain Risk Management guidance, vendor risk management should be proportional to the data sensitivity involved.

5. Your Incident Response Plan Has Never Been Tested

We had an incident response plan. It was a Google Doc someone wrote 18 months ago. It listed steps like "identify the incident" and "contain the threat" and "notify affected parties."

The auditor asked: "When was this plan last tested? Can you show me records of the test?"

Silence.

Having a plan is not enough. SOC 2 requires evidence that the plan has been tested and that lessons from the test were incorporated. CC7.4 specifically addresses "The entity responds to identified security incidents by executing a defined incident response program."

What fixed it: We ran a tabletop exercise. This is a meeting where you present a hypothetical security incident and walk through the response. No actual systems are affected. You just talk through: Who gets notified? What gets shut down? How do you communicate with customers? When do you involve legal?

We found 3 major gaps in our plan during the exercise:

Our escalation contact list had someone who left the company 6 months ago
Nobody knew how to rotate production database credentials in an emergency
Our customer notification template referenced a product name we'd rebranded away from

The tabletop exercise took 2 hours and was genuinely useful. We now run one quarterly.

The Pattern

Notice a pattern? None of these are technical security failures. Our encryption was fine. Our access controls were fine. Our infrastructure was properly configured.

The failures were all in processes, documentation, and evidence. SOC 2 isn't really a technical audit. Its a process audit that happens to involve technology.

The auditor wants to see three things for every control:

Design: Is the control designed to address the risk?
Implementation: Is the control actually implemented?
Operating effectiveness: Has the control been operating consistently during the observation period?

Most engineering teams focus on #2 (implementation) and forget about #1 (documentation) and #3 (evidence that it actually runs over time).

We eventually passed on our second attempt. The experience taught me that SOC 2 preparation is maybe 30% technical work and 70% process and documentation work. I wish someone had told us that before we spent 3 months only doing the technical part.

Top comments (2)

ArkForge • Apr 2

Hash-chained audit logging is the right direction, but the chain itself is only as trustworthy as whoever controls the storage. If your team can access the database where the chain lives, an auditor still has to trust that nobody altered it. That's the exact problem chains are supposed to solve.

The fix: anchor each chain checkpoint in a public transparency log external to your infrastructure. Sigstore Rekor is free and append-only. Hash the daily chain head, push it to Rekor, and the integrity guarantee no longer depends on your team's access controls. An auditor can verify the chain independently without trusting your infrastructure.

CC7.2 "monitors" is the key word. Logging without external anchoring is self-reported evidence.

Gary Jordan • Mar 31

This resonates a lot - especially point 2 and your conclusion about SOC 2 being 70% process and documentation work.
The offboarding checklist gap is one of the most common things we see with teams preparing for compliance audits. The problem isn't that people don't know what needs to happen - it's that there's no enforced, auditable record that it did happen, in the right order, by the right person, at the right time. Disabling an Okta account and ticking a mental checkbox isn't evidence. A timestamped, assigned task with a required sign-off and an attached screenshot is.
The TypeScript interface you posted is essentially describing what a proper checklist tool should produce automatically for every offboarding - each step assigned to a specific person, completion timestamps, evidence attachments, and an immutable audit trail showing who signed off what and when.
We built CheckFlow (checkflow.io) for exactly this kind of repeatable, multi-step process where accountability and evidence matter. The offboarding checklist becomes a template - every departing employee runs the same process, nothing gets skipped, and the completed checklist is your audit evidence. Your point about auditors checking the timing gap between termination and access revocation is a good one - when each step has a due date relative to the termination date, that gap becomes visible and manageable rather than something you discover during the audit.
Your point about it being a process audit that happens to involve technology is the most underrated insight in this whole post. The companies that sail through SOC 2 aren't necessarily more secure - they're just better at running and documenting their processes consistently.