<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Carl Ward</title>
    <description>The latest articles on DEV Community by Carl Ward (@carl_ward_2d0e6fee9693587).</description>
    <link>https://dev.to/carl_ward_2d0e6fee9693587</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3960028%2Fbcbdb814-bd54-42cf-9c13-8ff904e64982.jpg</url>
      <title>DEV Community: Carl Ward</title>
      <link>https://dev.to/carl_ward_2d0e6fee9693587</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/carl_ward_2d0e6fee9693587"/>
    <language>en</language>
    <item>
      <title>I cloned Formbricks and Documenso cold from GitHub and ran an AI spec audit. Here's what it found.</title>
      <dc:creator>Carl Ward</dc:creator>
      <pubDate>Wed, 03 Jun 2026 13:17:44 +0000</pubDate>
      <link>https://dev.to/carl_ward_2d0e6fee9693587/i-cloned-formbricks-and-documenso-cold-from-github-and-ran-an-ai-spec-audit-heres-what-it-found-11ag</link>
      <guid>https://dev.to/carl_ward_2d0e6fee9693587/i-cloned-formbricks-and-documenso-cold-from-github-and-ran-an-ai-spec-audit-heres-what-it-found-11ag</guid>
      <description>&lt;p&gt;I built a tool called SysEdge that models requirements, tests, and architecture standards in a Neo4j graph. It ships two AI-powered audit commands:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;coverage-review --uc UC-xxx&lt;/code&gt;&lt;/strong&gt; — evaluates a use case spec against 7 AS-REQ dimensions: actor completeness, authorization boundaries, exception coverage, ontology alignment, testability, scope definition, and bidirectional traceability with execution&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;audit-test --uc UC-xxx --file path/to/test.ts&lt;/code&gt;&lt;/strong&gt; — evaluates a test file against 7 AS-TEST-UC dimensions: element inventory, role visibility, happy path, error paths, equivalence partitions, semantic correctness, and specification derivation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I ran both on two production open-source codebases, cloned cold. Here's what they found.&lt;/p&gt;




&lt;h2&gt;
  
  
  Formbricks — survey platform (TypeScript/Next.js)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The defect:&lt;/strong&gt; survey response exports include PII when anonymisation is enabled.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it existed:&lt;/strong&gt; I ran &lt;code&gt;coverage-review&lt;/code&gt; on the export use case.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;python3 cli/sys_graph.py coverage-review &lt;span class="nt"&gt;--uc&lt;/span&gt; UC-FBK-005
&lt;span class="go"&gt;
[FAIL]  AS-REQ-003  Exception Coverage
        UC describes happy path only. No exception flow for anonymisation —
        what happens when anonymisation is enabled is completely unspecified.

[FAIL]  AS-REQ-006  Scope Definition
        Out-of-scope section is missing. Data masking, PII redaction, and
        anonymization are never mentioned — no decision recorded on whether
        privacy filtering is in or out of scope.

3 FAIL · 4 WARN · 0 PASS
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The requirement was never written. Code review has no diff to evaluate.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;test-gaps&lt;/code&gt; confirmed: F-ANLYS-003 (Export responses) had zero tests at every V-model tier — no unit test, no API contract test, no Playwright UI flow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Token measurement&lt;/strong&gt; (actual Anthropic API counts via &lt;code&gt;--output-format json&lt;/code&gt;, not estimates):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Tokens (in+out)&lt;/th&gt;
&lt;th&gt;Cache-read tokens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Without SysEdge&lt;/td&gt;
&lt;td&gt;14,512&lt;/td&gt;
&lt;td&gt;1,797,494&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;With SysEdge&lt;/td&gt;
&lt;td&gt;4,141&lt;/td&gt;
&lt;td&gt;473,649&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Saving&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;71%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;74%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The without-SysEdge session loaded the full codebase via prompt cache to orient on the defect. The with-SysEdge session used graph queries. Both cache figures are disclosed — the saving holds whether you count input+output or total tokens including cache reads.&lt;/p&gt;




&lt;h2&gt;
  
  
  Documenso — e-signature platform (TypeScript/Next.js)
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/documenso/documenso" rel="noopener noreferrer"&gt;github.com/documenso/documenso&lt;/a&gt; — open-source DocuSign alternative.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Setup:&lt;/strong&gt; cloned cold, built a domain seed from the Prisma schema, started Neo4j on a fresh port.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--name&lt;/span&gt; documenso-neo4j &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;NEO4J_AUTH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;neo4j/documenso123 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt; 7700:7687 neo4j:5.26

python3 cli/sys_graph.py init
python3 cli/sys_graph.py seed documenso-seed.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Time from clone to all findings: ~15 minutes. Total AI cost: ~$0.30 (haiku model).&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before we get to the findings: Documenso has 94 Playwright e2e tests and exactly 3 unit test files in the entire repo — two for webhook URL validation, one for CSS sanitization. That's the complete unit/integration test surface.&lt;/p&gt;




&lt;h3&gt;
  
  
  Finding 1: The Inngest job handlers that process envelope expiration have no tests
&lt;/h3&gt;

&lt;p&gt;Documenso uses &lt;a href="https://inngest.com/" rel="noopener noreferrer"&gt;Inngest&lt;/a&gt; for background jobs. The job system lives in &lt;code&gt;packages/lib/jobs/&lt;/code&gt;. That directory contains zero test files.&lt;/p&gt;

&lt;p&gt;The expiration job implementation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;packages/lib/jobs/definitions/internal/
  expire-recipients-sweep.handler.ts     ← sweeps for expired recipients, fans out
  process-recipient-expired.handler.ts   ← processes each expired recipient
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These handlers: query the database for pending envelopes past their &lt;code&gt;expiresAt&lt;/code&gt;, set status to EXPIRED, invalidate signing tokens, trigger the ENVELOPE_EXPIRED webhook, and dispatch notification emails.&lt;/p&gt;

&lt;p&gt;There is a Playwright e2e test: &lt;code&gt;envelope-expiration-signing.spec.ts&lt;/code&gt;. It's a good test — it seeds a recipient with &lt;code&gt;expiresAt&lt;/code&gt; in the past and verifies that clicking the signing link shows an "expired" error page. It passes and it's correct.&lt;/p&gt;

&lt;p&gt;But it tests the UI response to an expired state. It does not test whether the Inngest job that &lt;em&gt;creates&lt;/em&gt; that state runs correctly, handles partial failures, or fires the right webhook payload.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;python3 cli/sys_graph.py audit-test &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--uc&lt;/span&gt; UC-DOC-004 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--file&lt;/span&gt; packages/app-tests/e2e/envelopes/envelope-expiration-signing.spec.ts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[FAIL]  AS-TEST-UC-003  UC Test: Happy Path
        No test exercises the main UC success scenario: scheduled job runs,
        queries for envelopes with expiresAt in the past, sets status EXPIRED,
        invalidates signing tokens, fires ENVELOPE_EXPIRED webhook, sends
        notification emails to sender.
        → Create a test that seeds a PENDING envelope with expiresAt in the past,
          invokes the expiration job, and asserts all post-conditions.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the sweep job silently fails, envelopes never expire. Time-limited agreements can be signed after their legal deadline.&lt;/p&gt;




&lt;h3&gt;
  
  
  Finding 2: The expiration specification describes success only
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;python3 cli/sys_graph.py coverage-review &lt;span class="nt"&gt;--uc&lt;/span&gt; UC-DOC-004
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[FAIL]  AS-REQ-003  Exception Coverage
        No behaviour specified for:
        - Inngest job exception during status update
        - Webhook endpoint unreachable or returning 5xx
        - Email service unavailable
        - Database transaction failure mid-batch (partial token invalidation —
          some recipients expired, some not)

[FAIL]  AS-REQ-002  Authorization Boundaries
        No roles named. Can an admin manually trigger the expiration sweep?
        Can a recipient see that their envelope expired before receiving
        notification? No CASL permission string derivable.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a legally-binding e-signature platform, the partial-failure case is material: if the database write succeeds but the webhook fails, did the envelope expire? If the sweep processes 500 of 1000 expired recipients before throwing, what state are the remaining 500 in?&lt;/p&gt;




&lt;h3&gt;
  
  
  Finding 3: Void envelope atomicity is undefined
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;python3 cli/sys_graph.py coverage-review &lt;span class="nt"&gt;--uc&lt;/span&gt; UC-DOC-003
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[FAIL]  AS-REQ-003  Exception Coverage
        Void UC covers the happy path (status → VOIDED, tokens invalidated,
        webhook fired) but does not specify:
        - Behaviour when envelope is already VOIDED or EXPIRED
        - Whether void is atomic (webhook fires even if DB write fails?)
        - Minimum/maximum length of void reason field
        - Whether voiding is reversible
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Void only appears in tests within an &lt;code&gt;enterprise-feature-restrictions.spec.ts&lt;/code&gt; that is explicitly &lt;code&gt;test.describe.skip&lt;/code&gt; — not an active test.&lt;/p&gt;




&lt;h3&gt;
  
  
  Finding 4: The signing certificate test checks appearance, not validity
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;include-document-certificate.spec.ts&lt;/code&gt; exists and passes. It seeds a document, has a recipient sign it, and verifies that a certificate page appears in the completed PDF.&lt;/p&gt;

&lt;p&gt;What it does not test: whether the signature hash is cryptographically valid, whether the TSA timestamp (if configured) is correctly embedded, or whether the certificate fields match the actual signing event data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[FAIL]  AS-TEST-UC-003  UC Test: Happy Path
        Test verifies certificate appearance in PDF but does not assert
        signature validity, TSA timestamp correctness, or certificate
        field accuracy against signing event data.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The cryptographic pipeline — &lt;code&gt;packages/signing/transports/local.ts&lt;/code&gt;, &lt;code&gt;packages/signing/helpers/tsa.ts&lt;/code&gt; — has zero unit tests.&lt;/p&gt;




&lt;h3&gt;
  
  
  Finding 5: Zero specification derivation comments in existing tests
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[FAIL]  AS-TEST-UC-007  Specification Derivation
        No test function or test class in the expiration suite includes a
        comment tracing it to a UC step, precondition, or acceptance criterion.
        Test names use ENVELOPE_EXPIRATION labels but none reference
        UC-DOC-004 main flow steps.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a system where legal validity of signatures may be subject to audit, "this test verifies specification clause X" is the evidence chain that matters. None of the existing tests establish that chain.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why code review can't find these
&lt;/h2&gt;

&lt;p&gt;Every finding has the same structure: &lt;strong&gt;something was never written down.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;There is no diff for a missing Inngest test. There is no diff for an unspecified exception path. There is no diff for a use case that was never written. Code review sees what changed — these gaps exist in absence.&lt;/p&gt;

&lt;p&gt;The standard toolset produces false confidence: tests exist, they pass, coverage percentage is non-zero. None of that surface checks whether the &lt;em&gt;job handlers&lt;/em&gt; are tested, whether the &lt;em&gt;specification&lt;/em&gt; covers failure states, or whether the &lt;em&gt;test&lt;/em&gt; can be traced to the &lt;em&gt;specification&lt;/em&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real-world effectiveness from 11 parallel sessions
&lt;/h2&gt;

&lt;p&gt;After shipping these audit commands across our own 12-instance codebase, the instances reported back. Highlights:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;audit-test forced a full-walk test → revealed a missing UI feature the API already supported.&lt;/strong&gt; The readiness edit control didn't exist in CandidateDetailPanel — candidates had readiness set at creation only. The API supported &lt;code&gt;PUT&lt;/code&gt;, the UI never exposed it. audit-test's "happy path" dimension requires exercising the complete UC flow, which revealed the gap. Neither code review nor manual testing would have caught it — the API worked, the UI just didn't have the button.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;coverage-review found an acceptance criterion cut off mid-sentence.&lt;/strong&gt; The stored text was literally &lt;code&gt;"They can add a new OKR (objective text, key result"&lt;/code&gt; — truncated. Data quality gap invisible to all other review mechanisms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;audit-test AS-TEST-UC-002 (role visibility) gaps led to a real access control finding.&lt;/strong&gt; Testing denied roles via service token revealed the service role has &lt;code&gt;read:*&lt;/code&gt; regardless of email scope — a real architectural gap, documented as a defect.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;coverage-review AS-REQ-002 (auth boundaries) failing universally → all use cases now have explicit CASL permission strings.&lt;/strong&gt; 62 use cases updated in one session. Code review doesn't check whether specs name permission strings.&lt;/p&gt;




&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Clone any repo&lt;/span&gt;
git clone https://github.com/documenso/documenso ~/documenso-demo

&lt;span class="c"&gt;# 2. Start a fresh Neo4j (use a free port)&lt;/span&gt;
docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--name&lt;/span&gt; documenso-neo4j &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;NEO4J_AUTH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;neo4j/documenso123 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt; 7700:7687 neo4j:5.26

&lt;span class="c"&gt;# 3. Build a seed JSON from the domain model, then:&lt;/span&gt;
&lt;span class="nv"&gt;SYSGRAPH_NEO4J_URI&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;bolt://localhost:7700 &lt;span class="se"&gt;\&lt;/span&gt;
python3 cli/sys_graph.py seed documenso-seed.json

&lt;span class="c"&gt;# 4. Run the spec audit&lt;/span&gt;
python3 cli/sys_graph.py coverage-review &lt;span class="nt"&gt;--uc&lt;/span&gt; UC-DOC-004 &lt;span class="nt"&gt;--model&lt;/span&gt; haiku

&lt;span class="c"&gt;# 5. Run the test quality audit against a real test file&lt;/span&gt;
python3 cli/sys_graph.py audit-test &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--uc&lt;/span&gt; UC-DOC-004 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--file&lt;/span&gt; packages/app-tests/e2e/envelopes/envelope-expiration-signing.spec.ts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The AI calls use the Claude Code session tokens — no separate API key needed if you're running inside Claude Code. Haiku is fast enough for this workload: the full Documenso run was ~$0.30 total.&lt;/p&gt;




&lt;h2&gt;
  
  
  The full case studies
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Documenso findings&lt;/strong&gt; (with the actual commands and outputs): &lt;a href="https://www.org-edge.com/sysedge-documenso-review.html" rel="noopener noreferrer"&gt;org-edge.com/sysedge-documenso-review.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Formbricks + token measurement&lt;/strong&gt;: &lt;a href="https://www.org-edge.com/sysedge-token-savings.html" rel="noopener noreferrer"&gt;org-edge.com/sysedge-token-savings.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SysEdge free CLI&lt;/strong&gt;: &lt;a href="https://github.com/org-edge/sysedge" rel="noopener noreferrer"&gt;github.com/org-edge/sysedge&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;MIT + Commons Clause. Runs inside Claude Code with no API key configuration required.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>neo4j</category>
      <category>claudecode</category>
      <category>testing</category>
    </item>
    <item>
      <title>Requirements and code as a Neo4j ontology: reproducible token savings and multi-agent coordination</title>
      <dc:creator>Carl Ward</dc:creator>
      <pubDate>Sat, 30 May 2026 13:57:26 +0000</pubDate>
      <link>https://dev.to/carl_ward_2d0e6fee9693587/modelling-a-codebase-as-a-requirements-ontology-in-neo4j-keeping-ai-coding-agents-oriented-1db</link>
      <guid>https://dev.to/carl_ward_2d0e6fee9693587/modelling-a-codebase-as-a-requirements-ontology-in-neo4j-keeping-ai-coding-agents-oriented-1db</guid>
      <description>&lt;p&gt;AI coding agents have an expensive habit: before they write a single line, they re-read source files to work out what already exists — which modules there are, what each one provides, what's tested, and what's currently being changed. On a small repo that's tolerable. Run several agents in parallel on one codebase and it becomes both a token sink and a coordination problem: two agents start the same feature, a test gets added that nobody can map to a requirement, an architectural decision made in one session is invisible to the others.&lt;/p&gt;

&lt;p&gt;I kept hitting this running multiple Claude Code sessions against a single codebase, and ended up solving it the way you'd expect on a graph-shaped problem: model the codebase's &lt;em&gt;requirements chain&lt;/em&gt; as an ontology in Neo4j, and let the agents query the graph instead of re-reading the source.&lt;/p&gt;

&lt;p&gt;This post is about the data model and the queries — and why a graph is the right tool here rather than a table.&lt;/p&gt;

&lt;h2&gt;
  
  
  The model
&lt;/h2&gt;

&lt;p&gt;The core idea is full requirements traceability, from a user story down to the unit test that verifies a routine. Every artefact is a node; the relationships carry the meaning.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;:SysUserStory&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;:REALIZED_BY&lt;/span&gt;&lt;span class="ss"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;:SysUseCase&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;:SysUseCase&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;:REQUIRES&lt;/span&gt;&lt;span class="ss"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;:SysFeature&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;:SysModule&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;:PROVIDES&lt;/span&gt;&lt;span class="ss"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;:SysFeature&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;:SysModule&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;:CONTAINS_SYMBOL&lt;/span&gt;&lt;span class="ss"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;:SysSymbol&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;        &lt;span class="c1"&gt;// :SysSymbol / :SysEndpoint -[:IMPLEMENTS]-&amp;gt;(:SysFeature)&lt;/span&gt;
&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;:SysTest&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;:VERIFIES&lt;/span&gt;&lt;span class="ss"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;:SysFeature&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;                &lt;span class="c1"&gt;// tier via t.testType, or (:SysTestPackage {testCategory})-[:CONTAINS_TEST]-&amp;gt;(:SysTest)&lt;/span&gt;
&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;:SysArchDecision&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;:ADDRESSES&lt;/span&gt;&lt;span class="ss"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;:SysArchStd&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A &lt;code&gt;SysFeature&lt;/code&gt; isn't a ticket — it's a capability a &lt;code&gt;SysModule&lt;/code&gt; provides. A &lt;code&gt;SysUseCase&lt;/code&gt; isn't a description — it's a user-visible flow that realises a story. Every test carries its V-model tier — component, integration, use-case, or e2e — and a &lt;code&gt;VERIFIES&lt;/code&gt; edge tying it to the feature it covers. So "is this feature covered at every tier?" stops being a judgement call and becomes a reachability question.&lt;/p&gt;

&lt;p&gt;Reachability — and its mirror, &lt;em&gt;absence&lt;/em&gt; — is exactly what a graph answers cheaply, and it's the whole reason this lives in Neo4j rather than a table.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a graph, not a table
&lt;/h2&gt;

&lt;p&gt;The questions you actually want to ask of a codebase's requirements are reachability and absence questions, and those are one traversal in Cypher and an awkward pile of &lt;code&gt;NOT EXISTS&lt;/code&gt; joins in SQL.&lt;/p&gt;

&lt;p&gt;Which features have no use case covering them?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;f:&lt;/span&gt;&lt;span class="n"&gt;SysFeature&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;NOT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt; &lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;:SysUseCase&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;:REQUIRES&lt;/span&gt;&lt;span class="ss"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt; &lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="n"&gt;f.id&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="n"&gt;f.name&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Which architecture standards have no decision addressing them — i.e. the genuine architecture gaps?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;std:&lt;/span&gt;&lt;span class="n"&gt;SysArchStd&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;NOT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt; &lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;:SysArchDecision&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;:ADDRESSES&lt;/span&gt;&lt;span class="ss"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt; &lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="n"&gt;std.id&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="n"&gt;std.name&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Which features are missing integration-tier coverage?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;m:&lt;/span&gt;&lt;span class="n"&gt;SysModule&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;:PROVIDES&lt;/span&gt;&lt;span class="ss"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;f:&lt;/span&gt;&lt;span class="n"&gt;SysFeature&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="ow"&gt;NOT&lt;/span&gt; &lt;span class="ow"&gt;EXISTS&lt;/span&gt; &lt;span class="ss"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;t:&lt;/span&gt;&lt;span class="n"&gt;SysTest&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;:VERIFIES&lt;/span&gt;&lt;span class="ss"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;t.testType&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'integration'&lt;/span&gt;
     &lt;span class="n"&gt;OR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;:SysTestPackage&lt;/span&gt; &lt;span class="ss"&gt;{&lt;/span&gt;&lt;span class="py"&gt;testCategory:&lt;/span&gt;&lt;span class="s1"&gt;'integration'&lt;/span&gt;&lt;span class="ss"&gt;})&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;:CONTAINS_TEST&lt;/span&gt;&lt;span class="ss"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="ss"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="n"&gt;f.id&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="n"&gt;f.name&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A gap is just a node with no incoming edge of a given type (here, no verifying test in a given tier). That framing is what makes "is this actually tested?" a query rather than an opinion — the &lt;code&gt;VERIFIES&lt;/code&gt; edge either exists or it doesn't. Run a one-day pass of agents over a real codebase and you can watch coverage fill in as a &lt;em&gt;shape&lt;/em&gt; across the four V-model tiers (unit → integration → use-case → e2e), not as a single misleading percentage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Feeding the graph to the agent
&lt;/h2&gt;

&lt;p&gt;Here's the part that matters for the agents. Instead of letting a session open a 2,800-line handler file to orient, it runs a query and gets a compact briefing — coverage by module, open work, what's in progress — serialised to a few hundred tokens.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Coverage-by-module briefing for one agent's scope&lt;/span&gt;
&lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;m:&lt;/span&gt;&lt;span class="n"&gt;SysModule&lt;/span&gt; &lt;span class="ss"&gt;{&lt;/span&gt;&lt;span class="py"&gt;instance:&lt;/span&gt;&lt;span class="n"&gt;$instance&lt;/span&gt;&lt;span class="ss"&gt;})&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;:PROVIDES&lt;/span&gt;&lt;span class="ss"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;f:&lt;/span&gt;&lt;span class="n"&gt;SysFeature&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="ow"&gt;NOT&lt;/span&gt; &lt;span class="n"&gt;f.status&lt;/span&gt; &lt;span class="ow"&gt;IN&lt;/span&gt; &lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'Superseded'&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'Deprecated'&lt;/span&gt;&lt;span class="ss"&gt;]&lt;/span&gt;
&lt;span class="k"&gt;OPTIONAL&lt;/span&gt; &lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;t:&lt;/span&gt;&lt;span class="n"&gt;SysTest&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;:VERIFIES&lt;/span&gt;&lt;span class="ss"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="n"&gt;m.id&lt;/span&gt;            &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;module&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt;
       &lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt;
       &lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;tests&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;m.id&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is GraphRAG, just pointed at a codebase's specification instead of a document corpus: the graph is the retrieval layer, and what it returns is structured, current, and small.&lt;/p&gt;

&lt;p&gt;I measured the effect on the open-source &lt;a href="https://github.com/formbricks/formbricks" rel="noopener noreferrer"&gt;Formbricks&lt;/a&gt; repo. Closing a real defect took roughly &lt;strong&gt;71% fewer input+output tokens&lt;/strong&gt; with the graph than withou - 14,512 -&amp;gt; 4,141 actual Anthropic API tokens, or ~73% savings if you count total tokens including cache reads. Method and figures:  &lt;a href="https://www.org-edge.com/sysgraph.html" rel="noopener noreferrer"&gt;https://www.org-edge.com/sysgraph.html&lt;/a&gt; — and because the repo is public, you can clone it and re-run the comparison yourself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Coordination falls out of shared state
&lt;/h2&gt;

&lt;p&gt;The multi-agent win is almost a side effect. Because the graph is shared, mutable state, marking a work item in progress is a write that every other agent sees on its next query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;e:&lt;/span&gt;&lt;span class="n"&gt;SysEnhancement&lt;/span&gt; &lt;span class="ss"&gt;{&lt;/span&gt;&lt;span class="py"&gt;id:&lt;/span&gt;&lt;span class="n"&gt;$id&lt;/span&gt;&lt;span class="ss"&gt;})&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;e.status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'in-progress'&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e.startedAt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;$now&lt;/span&gt;
&lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The next agent sees that item flagged &lt;code&gt;in-progress&lt;/code&gt; in its worklog and routes to something else, so two sessions don't build the same thing. No human reconciling a dozen context files. The graph is the source of truth, and "what's left to do?" is a query.&lt;/p&gt;

&lt;h2&gt;
  
  
  Notes from running it
&lt;/h2&gt;

&lt;p&gt;A few things that surprised me:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Coverage as &lt;code&gt;VERIFIES&lt;/code&gt; edges per tier, not a single percentage,&lt;/strong&gt; meant gaps couldn't hide. A feature with integration tests but no component tests shows the hole rather than reporting "covered" — agents reported catching gaps they'd otherwise have rationalised away ("integration tests exist, so it feels covered").&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MERGE-only writes&lt;/strong&gt; for the agents, with destructive operations kept entirely out of their reach, was non-negotiable once multiple sessions shared one graph.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The seed is the cost.&lt;/strong&gt; Mapping an existing codebase into the initial node set is the one real setup step; everything compounds after that.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;The CLI that drives this is free and source-available (Neo4j Community under the hood): &lt;a href="https://github.com/org-edge/sysedge" rel="noopener noreferrer"&gt;https://github.com/org-edge/sysedge&lt;/a&gt;. If you're doing anything with LLM agents on a real codebase, I'd genuinely like to hear how others are modelling this — the ontology here is opinionated and I'm sure it can be sharpened.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built on Neo4j + a thin Python CLI. Works across Go, TypeScript, Python, Java, and C#.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>neo4j</category>
      <category>claudecode</category>
      <category>devtools</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
