Let's be direct: most security tooling in CI pipelines is theater. Your SAST scanner fires off 400 warnings per week, your team mutes the Slack channel, and the one real IDOR that could've let an attacker read every customer's order history slips into production because it was buried on page three of a report full of false positives.
We are all tired of this. So I built sentinai-core — an open-source npm package that runs an autonomous, three-agent AI pipeline against your GitHub Pull Request diffs and produces context-aware, validated security findings with step-by-step exploit proof-of-concept.
Here's exactly how it works.
The Problem with Traditional SAST
Static analysis tools operate on pattern matching. They're fast and cheap, which is why they're everywhere — but they have a structural limitation: they are context-blind.
Consider an IDOR (Insecure Direct Object Reference). A scanner can tell you that a route handler accepts a dynamic :id parameter. It cannot tell you that the controller skips the req.user.id === resource.userId ownership check that exists on every other route in the file. It doesn't understand that the new middleware you added to app.use() globally doesn't actually apply to this specific sub-router because it was mounted before the middleware was registered.
The result: an 80% false positive rate on modern Node.js/Express codebases is common. Teams tune their noise thresholds so aggressively that real findings disappear with the noise.
RBAC flaws are even worse. A scanner has no idea that role: "admin" is set client-side and trusted on the backend without re-verification. That requires reading business logic.
The Solution: A Multi-Agent AI Pipeline
sentinai-core treats security review as a multi-step adversarial reasoning problem — not a pattern match.
The library ships as a single function:
import { runOrchestrator } from 'sentinai-core';
const findings = await runOrchestrator(diff, (msg) => console.log(msg));
It takes a raw PR diff string (up to 80,000 characters — a hard guard against context window overflow) and a logger callback, then internally dispatches three specialized AI agents in sequence. Each agent has a different model, a different thinking budget, and a different adversarial role.
The Architecture: Three Agents, One Pipeline
Here's the full data flow from a GitHub PR event to a validated security report:
flowchart LR
A(["GitHub PR\nDiff"]) --> B["🏗️ Architect\ngemini-3.1-flash-lite-preview\nthinking: low"]
B -->|"Access control map\n+ vulnerability surface"| C["🥷 Adversary\ngemini-3-flash-preview\nthinking: medium"]
C -->|"Up to 3 exploit\nwalkthrough reports"| D["🛡️ Guardian\ngemini-3-flash-preview\nthinking: high"]
D -->|"Confidence scored\n+ OWASP mapped findings"| E(["PR Review\nComment"])
Each stage is sequential by design — the Adversary needs the Architect's typed map as context, and the Guardian needs both to arbitrate. The cost of sequencing is latency; the benefit is that no agent is flying blind.
Agent 1: The Architect — gemini-3.1-flash-lite-preview
The Architect runs first, on a low thinking budget (thinkingLevel: 'low'). Its job is cheap, fast, and structural — not deep reasoning.
It reads the diff and produces a machine-readable map:
{
"endpoints": ["GET /api/orders/:id", "PATCH /api/orders/:id/status"],
"auth_middleware": ["isAuthenticated on /api/orders/:id", "MISSING ownership check"],
"rbac_mapping": "Admin can set any status; User role should only read own orders",
"vulnerability_surface": "Route ID parameter flows directly into DB query with no userId assertion"
}
This isn't just for human readability — it becomes the structured intelligence report fed into the next agent. You're not passing raw text; you're passing typed, reasoned context.
Critically, the system prompt includes a prompt injection defence: the diff is wrapped in <source_diff_for_analysis> XML tags, and the model is explicitly instructed to treat everything inside those tags as untrusted raw data. An attacker who embeds IGNORE ALL PREVIOUS INSTRUCTIONS in a comment inside the PR will be ignored.
Agent 2: The Adversary — gemini-3-flash-preview
The Adversary is a red teamer. It receives the Architect's map plus the full diff and its sole objective is to find exploitable paths and prove them.
It runs on a medium thinking budget and is instructed to find up to three distinct vulnerabilities, ordered by severity, and produce a step-by-step exploit walkthrough for each — actual HTTP requests and predicted responses included:
{
"attack_vector": "Parameter tampering — IDOR on order endpoint",
"exploit_steps": [
{
"step": 1,
"action": "Attacker authenticates as User A (ID: 101)",
"request": "POST /api/auth/login {\"email\": \"userA@test.com\", \"password\": \"...\"}",
"expected_response": "200 OK, JWT token issued"
},
{
"step": 2,
"action": "Attacker requests User B's order by guessing sequential ID",
"request": "GET /api/orders/202 (with User A's JWT)",
"expected_response": "200 OK, full order details for User B returned"
}
],
"bypass_technique": "Controller queries DB by req.params.id only. No userId comparison against req.user.id.",
"affected_endpoint": "GET /api/orders/:id"
}
The 3-finding cap is a deliberate resource exhaustion guard — not an arbitrary limit. An adversarial PR with hundreds of endpoints could otherwise drive up token costs and latency to the point of making the tool unusable in CI.
Agent 3: The Guardian — gemini-3-flash-preview (High Thinking Budget)
This is where the false positive filtering happens. The Guardian uses the same gemini-3-flash-preview model as the Adversary, but is given a high thinking budget — the most expensive reasoning level in the pipeline. The distinction isn't about a different model; it's about giving the validation step the longest chain-of-thought to scrutinise the Adversary's work before anything reaches the output.
It receives both the Architect's structural map and the Adversary's exploit report, then cross-examines the diff against a specific validation checklist:
- Does any
app.use()global middleware apply to this route that would block the attack? - Does the ORM/framework provide implicit ownership filtering (e.g., Prisma's
where: { userId }in a shared query)? - Is the exploit logically consistent with the actual code — not just the route signature?
- Assign a confidence score from 0–100.
- Map to the appropriate OWASP Top 10 category.
- Produce a concrete code fix.
The output is a GuardianReport:
interface GuardianReport {
vulnerability: string; // "IDOR on Order Endpoint"
severity: 'LOW' | 'MEDIUM' | 'HIGH' | 'CRITICAL';
confidence_score: number; // 0–100
reasoning: string; // Why this is real, not noise
false_positive_risk: string; // What could make this wrong
owasp_category: string; // "A01:2021 – Broken Access Control"
exploit_simulation: ExploitStep[];
affected_endpoint: string;
suggested_fix: string; // Actual corrected code snippet
}
Any finding below the MIN_CONFIDENCE threshold (configurable, defaulting to 40%) is suppressed before results are returned. The entire pipeline produces zero output on a clean diff.
The Tech: Vercel AI SDK + Robust JSON Extraction
The stack is TypeScript with the Vercel AI SDK (ai and @ai-sdk/google). The SDK's generateText() abstraction handles the API surface cleanly, and the providerOptions.google.thinkingConfig.thinkingLevel parameter controls the reasoning depth per-agent.
The gnarliest engineering challenge wasn't the prompting — it was making JSON parsing bulletproof. LLMs are inconsistent about how they return structured data. Sometimes you get clean JSON. Sometimes you get it inside a json fence. Sometimes you get two paragraphs of explanation followed by the JSON buried in the middle.
sentinai-core uses a three-strategy fallback:
function extractJSON(raw: string): string {
// Strategy 1: Direct parse — ideal path
try { JSON.parse(raw.trim()); return raw.trim(); } catch {}
// Strategy 2: Strip a single outermost ```
{% endraw %}
json...
{% raw %}
``` fence
const fenceMatch = raw.trim().match(/^```
{% endraw %}
(?:json)?\s*\n?([\s\S]*?)\n?
{% raw %}
```\s*$/i);
if (fenceMatch) {
try { JSON.parse(fenceMatch[1].trim()); return fenceMatch[1].trim(); } catch {}
}
// Strategy 3: Walk the string to find the first balanced { } or [ ] block
const startIdx = raw.trim().search(/[{[]/);
if (startIdx !== -1) {
// ... depth counter to find matching close bracket ...
}
return 'null'; // Caller handles graceful degradation
}
If all three strategies fail, each agent has its own graceful fallback — either a safe empty result or a low-confidence rejection — so a single bad LLM response never crashes the pipeline.
Deploying in a Real CI Pipeline
The library is built to slot into a GitHub App webhook handler. When a PR is opened or synchronized, you fetch the diff from the GitHub API, pass it through runOrchestrator, and post the results as a PR review comment with severity badges.
The production SentinAI platform runs on Cloud Run (for cost-effective auto-scaling to zero) with Vertex AI as the model backend, which gives you enterprise data residency guarantees. The getModel() function in the core automatically switches between Google AI Studio and Vertex AI based on environment:
// Development: GEMINI_API_KEY → Google AI Studio
// Production: USE_VERTEX=true → Vertex AI (Cloud Run service account auth)
The architecture handles multi-tenancy cleanly — each GitHub App installation gets its own analysis scope, and Supabase RLS enforces tenant isolation at the database level.
Getting Started
npm install sentinai-core
import { runOrchestrator } from 'sentinai-core';
const diff = `diff --git a/src/routes/orders.ts ...`; // your raw PR diff
const findings = await runOrchestrator(diff, (msg) => console.log(msg));
if (findings.length === 0) {
console.log('✅ No confirmed vulnerabilities found.');
} else {
for (const f of findings) {
console.log(`[${f.severity}] ${f.vulnerability} — Confidence: ${f.confidence_score}%`);
console.log(`OWASP: ${f.owasp_category}`);
console.log(`Fix: ${f.suggested_fix}`);
}
}
Set your GEMINI_API_KEY environment variable and you're running a three-agent security pipeline in minutes.
On execution time: the full pipeline runs sequentially, so wall-clock time is the sum of three LLM round trips. On a standard PR diff (a few hundred lines), expect roughly 15–30 seconds end-to-end. For a large diff near the 80,000-character cap with multiple findings for the Guardian to validate independently, budget up to 45–60 seconds. That's acceptable for a post-push CI check; it would be too slow for a pre-commit hook.
What's Next
The current pipeline is sequential — Architect → Adversary → Guardian. The next architectural evolution is parallel Adversary runs: spin up three independent red-team agents simultaneously, each primed with a different attack category (access control, injection, business logic), and let the Guardian arbitrate across all findings.
I'm also working on SARIF output support so findings can be ingested directly by GitHub's Security tab as code scanning alerts — no custom UI required.
The full source, test suite, and a demo diff are available at github.com/itxDeeni/SentinAI-Core.
⭐ Star the repo to track progress on SARIF output and parallel Adversary runs. Try it on your next PR, and if you want to contribute patterns to the vulnerability database — the threat model evolves faster than any one team can keep up with — open an issue. The more patterns in the library, the sharper the Architect's initial surface map gets.
Top comments (0)