Hriday Vig

Posted on May 19

I built a workflow-aware verification layer for AI coding agents — open source, MCP-native

#mcp #ai #architecture #opensource

TL;DR

Autonomous coding agents are good at writing code. They are bad at knowing what's actually risky about the code they just wrote.

I built Veris -- an MCP-native verification intelligence layer. You point it at a repo, it builds a behavioral graph, groups functions into semantic workflows (Authentication, Payments, Webhooks, Caching, Queue, etc.), and emits concrete adversarial probes per workflow.

Install + run:

npx veris-core analyze

That's it. Open the HTML dashboard, see what could break.

MIT. Sponsor-funded. No telemetry. Local SQLite. Repo here.

Why I built it

Every coding agent I use -- Claude Code, Cursor, Aider, you name it -- has the same blindspot.

I ask it to "add a new Stripe charge flow." It writes the code, runs the unit tests, says it's done. Tests pass. The PR merges. Three days later, prod has duplicate charges because there's no idempotency key.

The agent didn't miss a bug. It missed an entire category of failure that doesn't show up in unit tests: idempotency under retry. There is no test for "the same request hits us twice in 500ms because of network retry." Until prod surfaces it.

Veris exists to make that category visible before the PR merges.

How it works

1. Behavioral graph

Veris parses every TS/JS/JSX/TSX/MJS/CJS file in your repo via ts-morph. For each, it extracts symbols (classes, functions, methods, top-level arrow assignments, module.exports.X = function, Foo.prototype.method = function) and resolves cross-module imports + invocations into edges.

On Express (141 files) it produces 93 nodes and 71 edges in under a second. On Next.js (2,444 files) it builds the graph in the same timeframe via basename pre-indexing and local-import filtering.

nodes  = symbols (Class / Method / Function)
edges  = DependsOn (file import) | Invokes (call)

2. Semantic workflow grouping

This is the moat. Raw graphs are noise. Veris classifies each node into one of 25 workflow domains using a weighted vote across three signals:

Path tokens -- src/payments/charge.ts becomes strongly Payments.
Import tokens -- import stripe from 'stripe' becomes Payments-adjacent.
Symbol tokens -- processPayment, refund match via word-boundary + camelCase.

Exact-segment path matches outrank import-token matches, which outrank symbol matches. Test/sample/fixture dirs get a 30% multiplier so a real src/auth/login.ts always beats tests/auth.spec.ts.

Rules live in data/workflow-rules.json. Override per repo at .veris/data/workflow-rules.json. Add new domains via .veris/plugins/*.js.

3. Adversarial probe templates

Each workflow has a deck of concrete probes. Examples:

Payments:

Submit charge twice with the same idempotency key inside a 500ms window. Expected: exactly one ledger entry; second call returns the first result.
Capture succeeds at gateway, response times out before reaching us. Expected: reconciliation eventually marks order paid; no orphan charge.

Webhooks:

Replay a 24-hour-old signed payload with the original signature. Expected: replay rejected by timestamp window even though signature is valid.

Caching:

Mass cache expiry triggers thundering herd on origin. Expected: single-flight or jittered refresh; origin not overwhelmed.

Veris never runs the probe. It emits the directive. Your agent (or human) runs it and calls report_execution via MCP to feed results back into the confidence model.

4. Behavioral drift detection

Veris fingerprints each workflow (SHA-256 of sorted edges + members + key signals) and stores fingerprints in .veris/state.db. Run again later -- drift is the diff of fingerprints. A workflow that silently changed (member set identical, edges shifted) is the most dangerous kind because nobody's looking.

5. Confidence model

Per-workflow risk score = weighted blast radius + runtime criticality + dependency fragility. Math weights live in data/risk-config.json. Every number visible and explainable. Confidence decays with a 14-day half-life; execution feedback restores it.

What you actually see

I ran Veris on a self-contained synthetic app with 17 planted bugs across 11 workflows. (Demo app + ground truth here.)

Planted bug	Workflow detected	Probe fired
JWT expiry check uses `<` not `<=`	Authentication	"Refresh token at the exact expiry boundary while two requests in flight"
Stripe charge has no idempotency key	Payments	"Submit charge twice with the same idempotency key inside a 500ms window"
Webhook handler not idempotent	Webhooks	"Sender delivers 50 retries of the same event id within 1 minute"
`updateProduct` doesn't invalidate cache	Caching	"Invalidation event arrives out of order with the write"
Worker side-effects not idempotent	Queue	"Worker crashes after side effect but before ack"
N+1 in `getOrdersWithItems`	Persistence	"Two transactions update the same row; commit order non-deterministic"
`/admin/users` has no auth middleware	Routing	"Middleware order changes -- unauthenticated request reaches handler"

Every planted bug got a matching probe. Veris doesn't read function bodies to find < vs <= -- it surfaces the workflow and the probe directive. The agent (or human) runs the probe.

Validated on real OSS repos

Repo	Nodes	Edges	Workflows	Probes
Express	93	71	6	11
Next.js	2,400+	~30k	13	19
Prisma	3,696	25,046	13	21
NestJS	3,712	31,890	14	21
Strapi	6,982	40,027	21	25

Each one surfaced real bugs in Veris itself that I then fixed. The shakedown is the dev loop -- running Veris on Express revealed missing CommonJS extraction; Next.js revealed an O(N^2) edge explosion; Prisma revealed a Windows MAX_PATH crash; NestJS revealed AI false positives from CLI prompt scaffolding. All shipped fixes are in the CHANGELOG.

MCP integration

17 tools exposed via stdio:

analyze_repository        export_behavioral_graph    analyze_pr_behavior
generate_verification_plan identify_unverified_behaviors
list_workflows            analyze_workflow           detect_drift
generate_adversarial_probes allocate_budget          what_if_revert
report_execution          confidence_history         node_history
export_onboarding         cross_repo_snapshot        register_repo

Wire into any MCP-compatible client:

{
  "mcpServers": {
    "veris": {
      "command": "npx",
      "args": ["-y", "veris-core", "mcp"]
    }
  }
}

Then ask the agent: "List the workflows in this repo affected by my current PR. For the highest-risk one, give me the adversarial probes I should run before merging."

Also discoverable via the official MCP Registry as io.github.vighriday/veris, and via npx skills add vighriday/Veris for the skills.sh ecosystem.

Privacy + posture

MIT. No paid tier. No license gating. No telemetry endpoints.
VERIS_STATE_DISABLED=1 for zero-retention mode (skips all SQLite writes).
Local-first. No network calls. State lives at <projectRoot>/.veris/state.db.
No analytics. No phone-home.

Funding: GitHub Sponsors when I get them. Until then, my own time.

What's next

More language adapters (Python next, then Go).
More workflow domains via community plugins (.veris/plugins/*.js).
Tighter Cursor integration.
Public registry of community probe libraries.

If this resonates: star the repo, file issues with your false positives, or contribute a workflow rule.

If you find a planted bug in the demo app that Veris missed -- open an issue with the workflow + missing probe. That's the loop.

Repo: github.com/vighriday/Veris

NPM: npmjs.com/package/veris-core

MCP Registry: io.github.vighriday/veris

Top comments (1)

Harjot Singh • Jun 1

that sounds like a really valuable tool for catching those blind spots in AI-generated code. the risks you highlighted are all too real. on a different note, if you're looking to quickly deploy apps, check out Moonshift. you can get a full next.js + postgres + auth build up and running in about 7 minutes, and you own the code on your github. let me know if you're interested in a free run.