TL;DR
Autonomous coding agents are good at writing code. They are bad at knowing what's actually risky about the code they just wrote.
I built Veris -- an MCP-native verification intelligence layer. You point it at a repo, it builds a behavioral graph, groups functions into semantic workflows (Authentication, Payments, Webhooks, Caching, Queue, etc.), and emits concrete adversarial probes per workflow.
Install + run:
npx veris-core analyze
That's it. Open the HTML dashboard, see what could break.
MIT. Sponsor-funded. No telemetry. Local SQLite. Repo here.
Why I built it
Every coding agent I use -- Claude Code, Cursor, Aider, you name it -- has the same blindspot.
I ask it to "add a new Stripe charge flow." It writes the code, runs the unit tests, says it's done. Tests pass. The PR merges. Three days later, prod has duplicate charges because there's no idempotency key.
The agent didn't miss a bug. It missed an entire category of failure that doesn't show up in unit tests: idempotency under retry. There is no test for "the same request hits us twice in 500ms because of network retry." Until prod surfaces it.
Veris exists to make that category visible before the PR merges.
How it works
1. Behavioral graph
Veris parses every TS/JS/JSX/TSX/MJS/CJS file in your repo via ts-morph. For each, it extracts symbols (classes, functions, methods, top-level arrow assignments, module.exports.X = function, Foo.prototype.method = function) and resolves cross-module imports + invocations into edges.
On Express (141 files) it produces 93 nodes and 71 edges in under a second. On Next.js (2,444 files) it builds the graph in the same timeframe via basename pre-indexing and local-import filtering.
nodes = symbols (Class / Method / Function)
edges = DependsOn (file import) | Invokes (call)
2. Semantic workflow grouping
This is the moat. Raw graphs are noise. Veris classifies each node into one of 25 workflow domains using a weighted vote across three signals:
-
Path tokens --
src/payments/charge.tsbecomes strongly Payments. -
Import tokens --
import stripe from 'stripe'becomes Payments-adjacent. -
Symbol tokens --
processPayment,refundmatch via word-boundary + camelCase.
Exact-segment path matches outrank import-token matches, which outrank symbol matches. Test/sample/fixture dirs get a 30% multiplier so a real src/auth/login.ts always beats tests/auth.spec.ts.
Rules live in data/workflow-rules.json. Override per repo at .veris/data/workflow-rules.json. Add new domains via .veris/plugins/*.js.
3. Adversarial probe templates
Each workflow has a deck of concrete probes. Examples:
Payments:
- Submit charge twice with the same idempotency key inside a 500ms window. Expected: exactly one ledger entry; second call returns the first result.
- Capture succeeds at gateway, response times out before reaching us. Expected: reconciliation eventually marks order paid; no orphan charge.
Webhooks:
- Replay a 24-hour-old signed payload with the original signature. Expected: replay rejected by timestamp window even though signature is valid.
Caching:
- Mass cache expiry triggers thundering herd on origin. Expected: single-flight or jittered refresh; origin not overwhelmed.
Veris never runs the probe. It emits the directive. Your agent (or human) runs it and calls report_execution via MCP to feed results back into the confidence model.
4. Behavioral drift detection
Veris fingerprints each workflow (SHA-256 of sorted edges + members + key signals) and stores fingerprints in .veris/state.db. Run again later -- drift is the diff of fingerprints. A workflow that silently changed (member set identical, edges shifted) is the most dangerous kind because nobody's looking.
5. Confidence model
Per-workflow risk score = weighted blast radius + runtime criticality + dependency fragility. Math weights live in data/risk-config.json. Every number visible and explainable. Confidence decays with a 14-day half-life; execution feedback restores it.
What you actually see
I ran Veris on a self-contained synthetic app with 17 planted bugs across 11 workflows. (Demo app + ground truth here.)
| Planted bug | Workflow detected | Probe fired |
|---|---|---|
JWT expiry check uses < not <=
|
Authentication | "Refresh token at the exact expiry boundary while two requests in flight" |
| Stripe charge has no idempotency key | Payments | "Submit charge twice with the same idempotency key inside a 500ms window" |
| Webhook handler not idempotent | Webhooks | "Sender delivers 50 retries of the same event id within 1 minute" |
updateProduct doesn't invalidate cache |
Caching | "Invalidation event arrives out of order with the write" |
| Worker side-effects not idempotent | Queue | "Worker crashes after side effect but before ack" |
N+1 in getOrdersWithItems
|
Persistence | "Two transactions update the same row; commit order non-deterministic" |
/admin/users has no auth middleware |
Routing | "Middleware order changes -- unauthenticated request reaches handler" |
Every planted bug got a matching probe. Veris doesn't read function bodies to find < vs <= -- it surfaces the workflow and the probe directive. The agent (or human) runs the probe.
Validated on real OSS repos
| Repo | Nodes | Edges | Workflows | Probes |
|---|---|---|---|---|
| Express | 93 | 71 | 6 | 11 |
| Next.js | 2,400+ | ~30k | 13 | 19 |
| Prisma | 3,696 | 25,046 | 13 | 21 |
| NestJS | 3,712 | 31,890 | 14 | 21 |
| Strapi | 6,982 | 40,027 | 21 | 25 |
Each one surfaced real bugs in Veris itself that I then fixed. The shakedown is the dev loop -- running Veris on Express revealed missing CommonJS extraction; Next.js revealed an O(N^2) edge explosion; Prisma revealed a Windows MAX_PATH crash; NestJS revealed AI false positives from CLI prompt scaffolding. All shipped fixes are in the CHANGELOG.
MCP integration
17 tools exposed via stdio:
analyze_repository export_behavioral_graph analyze_pr_behavior
generate_verification_plan identify_unverified_behaviors
list_workflows analyze_workflow detect_drift
generate_adversarial_probes allocate_budget what_if_revert
report_execution confidence_history node_history
export_onboarding cross_repo_snapshot register_repo
Wire into any MCP-compatible client:
{
"mcpServers": {
"veris": {
"command": "npx",
"args": ["-y", "veris-core", "mcp"]
}
}
}
Then ask the agent: "List the workflows in this repo affected by my current PR. For the highest-risk one, give me the adversarial probes I should run before merging."
Also discoverable via the official MCP Registry as io.github.vighriday/veris, and via npx skills add vighriday/Veris for the skills.sh ecosystem.
Privacy + posture
- MIT. No paid tier. No license gating. No telemetry endpoints.
-
VERIS_STATE_DISABLED=1for zero-retention mode (skips all SQLite writes). -
Local-first. No network calls. State lives at
<projectRoot>/.veris/state.db. - No analytics. No phone-home.
Funding: GitHub Sponsors when I get them. Until then, my own time.
What's next
- More language adapters (Python next, then Go).
- More workflow domains via community plugins (
.veris/plugins/*.js). - Tighter Cursor integration.
- Public registry of community probe libraries.
If this resonates: star the repo, file issues with your false positives, or contribute a workflow rule.
If you find a planted bug in the demo app that Veris missed -- open an issue with the workflow + missing probe. That's the loop.
Repo: github.com/vighriday/Veris
NPM: npmjs.com/package/veris-core
MCP Registry: io.github.vighriday/veris
Top comments (0)