Carl Ward

Posted on Jun 3

I cloned Formbricks and Documenso cold from GitHub and ran an AI spec audit. Here's what it found.

#ai #neo4j #claudecode #testing

I built a tool called SysEdge that models requirements, tests, and architecture standards in a Neo4j graph. It ships two AI-powered audit commands:

coverage-review --uc UC-xxx — evaluates a use case spec against 7 AS-REQ dimensions: actor completeness, authorization boundaries, exception coverage, ontology alignment, testability, scope definition, and bidirectional traceability with execution
audit-test --uc UC-xxx --file path/to/test.ts — evaluates a test file against 7 AS-TEST-UC dimensions: element inventory, role visibility, happy path, error paths, equivalence partitions, semantic correctness, and specification derivation

I ran both on two production open-source codebases, cloned cold. Here's what they found.

Formbricks — survey platform (TypeScript/Next.js)

The defect: survey response exports include PII when anonymisation is enabled.

Why it existed: I ran coverage-review on the export use case.

$ python3 cli/sys_graph.py coverage-review --uc UC-FBK-005

[FAIL]  AS-REQ-003  Exception Coverage
        UC describes happy path only. No exception flow for anonymisation —
        what happens when anonymisation is enabled is completely unspecified.

[FAIL]  AS-REQ-006  Scope Definition
        Out-of-scope section is missing. Data masking, PII redaction, and
        anonymization are never mentioned — no decision recorded on whether
        privacy filtering is in or out of scope.

3 FAIL · 4 WARN · 0 PASS

The requirement was never written. Code review has no diff to evaluate.

test-gaps confirmed: F-ANLYS-003 (Export responses) had zero tests at every V-model tier — no unit test, no API contract test, no Playwright UI flow.

Token measurement (actual Anthropic API counts via --output-format json, not estimates):

	Tokens (in+out)	Cache-read tokens
Without SysEdge	14,512	1,797,494
With SysEdge	4,141	473,649
Saving	71%	74%

The without-SysEdge session loaded the full codebase via prompt cache to orient on the defect. The with-SysEdge session used graph queries. Both cache figures are disclosed — the saving holds whether you count input+output or total tokens including cache reads.

Documenso — e-signature platform (TypeScript/Next.js)

github.com/documenso/documenso — open-source DocuSign alternative.

Setup: cloned cold, built a domain seed from the Prisma schema, started Neo4j on a fresh port.

docker run -d --name documenso-neo4j \
  -e NEO4J_AUTH=neo4j/documenso123 \
  -p 7700:7687 neo4j:5.26

python3 cli/sys_graph.py init
python3 cli/sys_graph.py seed documenso-seed.json

Time from clone to all findings: ~15 minutes. Total AI cost: ~$0.30 (haiku model).

Before we get to the findings: Documenso has 94 Playwright e2e tests and exactly 3 unit test files in the entire repo — two for webhook URL validation, one for CSS sanitization. That's the complete unit/integration test surface.

Finding 1: The Inngest job handlers that process envelope expiration have no tests

Documenso uses Inngest for background jobs. The job system lives in packages/lib/jobs/. That directory contains zero test files.

The expiration job implementation:

packages/lib/jobs/definitions/internal/
  expire-recipients-sweep.handler.ts     ← sweeps for expired recipients, fans out
  process-recipient-expired.handler.ts   ← processes each expired recipient

These handlers: query the database for pending envelopes past their expiresAt, set status to EXPIRED, invalidate signing tokens, trigger the ENVELOPE_EXPIRED webhook, and dispatch notification emails.

There is a Playwright e2e test: envelope-expiration-signing.spec.ts. It's a good test — it seeds a recipient with expiresAt in the past and verifies that clicking the signing link shows an "expired" error page. It passes and it's correct.

But it tests the UI response to an expired state. It does not test whether the Inngest job that creates that state runs correctly, handles partial failures, or fires the right webhook payload.

$ python3 cli/sys_graph.py audit-test \
  --uc UC-DOC-004 \
  --file packages/app-tests/e2e/envelopes/envelope-expiration-signing.spec.ts

[FAIL]  AS-TEST-UC-003  UC Test: Happy Path
        No test exercises the main UC success scenario: scheduled job runs,
        queries for envelopes with expiresAt in the past, sets status EXPIRED,
        invalidates signing tokens, fires ENVELOPE_EXPIRED webhook, sends
        notification emails to sender.
        → Create a test that seeds a PENDING envelope with expiresAt in the past,
          invokes the expiration job, and asserts all post-conditions.

If the sweep job silently fails, envelopes never expire. Time-limited agreements can be signed after their legal deadline.

Finding 2: The expiration specification describes success only

$ python3 cli/sys_graph.py coverage-review --uc UC-DOC-004

[FAIL]  AS-REQ-003  Exception Coverage
        No behaviour specified for:
        - Inngest job exception during status update
        - Webhook endpoint unreachable or returning 5xx
        - Email service unavailable
        - Database transaction failure mid-batch (partial token invalidation —
          some recipients expired, some not)

[FAIL]  AS-REQ-002  Authorization Boundaries
        No roles named. Can an admin manually trigger the expiration sweep?
        Can a recipient see that their envelope expired before receiving
        notification? No CASL permission string derivable.

For a legally-binding e-signature platform, the partial-failure case is material: if the database write succeeds but the webhook fails, did the envelope expire? If the sweep processes 500 of 1000 expired recipients before throwing, what state are the remaining 500 in?

Finding 3: Void envelope atomicity is undefined

$ python3 cli/sys_graph.py coverage-review --uc UC-DOC-003

[FAIL]  AS-REQ-003  Exception Coverage
        Void UC covers the happy path (status → VOIDED, tokens invalidated,
        webhook fired) but does not specify:
        - Behaviour when envelope is already VOIDED or EXPIRED
        - Whether void is atomic (webhook fires even if DB write fails?)
        - Minimum/maximum length of void reason field
        - Whether voiding is reversible

Void only appears in tests within an enterprise-feature-restrictions.spec.ts that is explicitly test.describe.skip — not an active test.

Finding 4: The signing certificate test checks appearance, not validity

include-document-certificate.spec.ts exists and passes. It seeds a document, has a recipient sign it, and verifies that a certificate page appears in the completed PDF.

What it does not test: whether the signature hash is cryptographically valid, whether the TSA timestamp (if configured) is correctly embedded, or whether the certificate fields match the actual signing event data.

[FAIL]  AS-TEST-UC-003  UC Test: Happy Path
        Test verifies certificate appearance in PDF but does not assert
        signature validity, TSA timestamp correctness, or certificate
        field accuracy against signing event data.

The cryptographic pipeline — packages/signing/transports/local.ts, packages/signing/helpers/tsa.ts — has zero unit tests.

Finding 5: Zero specification derivation comments in existing tests

[FAIL]  AS-TEST-UC-007  Specification Derivation
        No test function or test class in the expiration suite includes a
        comment tracing it to a UC step, precondition, or acceptance criterion.
        Test names use ENVELOPE_EXPIRATION labels but none reference
        UC-DOC-004 main flow steps.

For a system where legal validity of signatures may be subject to audit, "this test verifies specification clause X" is the evidence chain that matters. None of the existing tests establish that chain.

Why code review can't find these

Every finding has the same structure: something was never written down.

There is no diff for a missing Inngest test. There is no diff for an unspecified exception path. There is no diff for a use case that was never written. Code review sees what changed — these gaps exist in absence.

The standard toolset produces false confidence: tests exist, they pass, coverage percentage is non-zero. None of that surface checks whether the job handlers are tested, whether the specification covers failure states, or whether the test can be traced to the specification.

Real-world effectiveness from 11 parallel sessions

After shipping these audit commands across our own 12-instance codebase, the instances reported back. Highlights:

audit-test forced a full-walk test → revealed a missing UI feature the API already supported. The readiness edit control didn't exist in CandidateDetailPanel — candidates had readiness set at creation only. The API supported PUT, the UI never exposed it. audit-test's "happy path" dimension requires exercising the complete UC flow, which revealed the gap. Neither code review nor manual testing would have caught it — the API worked, the UI just didn't have the button.

coverage-review found an acceptance criterion cut off mid-sentence. The stored text was literally "They can add a new OKR (objective text, key result" — truncated. Data quality gap invisible to all other review mechanisms.

audit-test AS-TEST-UC-002 (role visibility) gaps led to a real access control finding. Testing denied roles via service token revealed the service role has read:* regardless of email scope — a real architectural gap, documented as a defect.

coverage-review AS-REQ-002 (auth boundaries) failing universally → all use cases now have explicit CASL permission strings. 62 use cases updated in one session. Code review doesn't check whether specs name permission strings.

How it works

# 1. Clone any repo
git clone https://github.com/documenso/documenso ~/documenso-demo

# 2. Start a fresh Neo4j (use a free port)
docker run -d --name documenso-neo4j \
  -e NEO4J_AUTH=neo4j/documenso123 \
  -p 7700:7687 neo4j:5.26

# 3. Build a seed JSON from the domain model, then:
SYSGRAPH_NEO4J_URI=bolt://localhost:7700 \
python3 cli/sys_graph.py seed documenso-seed.json

# 4. Run the spec audit
python3 cli/sys_graph.py coverage-review --uc UC-DOC-004 --model haiku

# 5. Run the test quality audit against a real test file
python3 cli/sys_graph.py audit-test \
  --uc UC-DOC-004 \
  --file packages/app-tests/e2e/envelopes/envelope-expiration-signing.spec.ts

The AI calls use the Claude Code session tokens — no separate API key needed if you're running inside Claude Code. Haiku is fast enough for this workload: the full Documenso run was ~$0.30 total.

The full case studies

Documenso findings (with the actual commands and outputs): org-edge.com/sysedge-documenso-review.html
Formbricks + token measurement: org-edge.com/sysedge-token-savings.html
SysEdge free CLI: github.com/org-edge/sysedge

MIT + Commons Clause. Runs inside Claude Code with no API key configuration required.

DEV Community