YUVRAJ

Posted on May 24

I built an AI IDE where the AI tries to hack its own code here's how the swarm works

#ai #cybersecurity #webdev #opensource

Most AI coding tools are built to assist you.

Rogue Studio is built to attack you — specifically, to attack
the code it just wrote for you, find every vulnerability, and
force itself to fix them before you ever see the output.

Here's how I built the adversarial swarm architecture, and why
I think this approach to AI-assisted security is underexplored.

Why I built this

I kept hitting the same wall.

I'm a security-focused developer. I do legitimate vulnerability
research. I analyze contracts, decompile binaries, write PoC
exploits against test environments I own.

And every single AI coding tool I tried would either refuse,
add a disclaimer, water down the output, or quietly route my
code to a server I had no visibility into.

I didn't want a safer AI.
I wanted a sharper one.

So I spent the last few weeks building the tool I actually
wanted to use. I called it Rogue Studio.

It's open source, self-hostable, and it does not apologize
for what it is.

→ github.com/malgatyuvraj/Rogue-Studio

The core idea: make the AI fight itself

Here's the problem with AI-generated code from a
security perspective.

The same model that writes your code is the one
you're asking to review it. It has the same blind spots
in both directions. It wrote the vulnerable pattern,
so it's likely to miss the vulnerable pattern.

The fix is obvious once you see it: use a different
agent with a different system prompt — one that is
not trying to be helpful, but trying to be destructive.

That's Swarm Mode.

Two agents. One goal. No guardrails.
Blue Team → writes the code → signals
↓
Red Team audits for vulnerabilities
↓

↓ ↓
exploit details fed loop terminates ✅
back to Blue Team
↓
Blue Team patches
↓
back to Red Team audit
↓
(hard cap: 3 iterations)

text

The agents communicate through sentinel tokens
embedded in their output stream:

<done> — Blue Team has finished writing
<vulnerable> — Red Team found at least one issue
<secure> — Red Team confirms the code is clean

The orchestrator parses these tokens from the
streaming response in real time and routes accordingly.

Here's the actual loop:

const MAX_SWARM_ITERATIONS = 3;
let verdict = await runSwarm(userTask);
let iter = 1;

while (verdict === "vulnerable" && iter < MAX_SWARM_ITERATIONS) {
  const patchTask = `
The Red Team found these vulnerabilities:
${swarm.redOutput}

Original code:
${swarm.blueOutput}

Patch all vulnerabilities.`;

  verdict = await runSwarm(patchTask);
  iter++;
}

The hard cap at 3 exists because some vulnerability
classes genuinely can't be patched without redesigning
the architecture. If the loop hits 3, it surfaces the
remaining issues to the developer instead of spinning
forever.

What the Red Team hunts for

The Red Team system prompt is specifically tuned to
find these vulnerability classes:

XSS — unsanitized user input reaching the DOM
SQL Injection — string concatenation in queries
Buffer Overflows — unsafe memory ops in C/C++
Reentrancy — Solidity withdraw-before-state-update
SSRF — unvalidated URLs in server-side fetches
Path Traversal — unsanitized file path inputs

Both agents stream simultaneously to a split terminal —
blue glow on the left, red glow on the right. It looks
exactly as dramatic as it sounds.

The part nobody else is building: Air-Gap mode

I want to talk about this one because I think it's
architecturally more interesting than it looks.

Every "local AI" tool I've seen has the same problem.
They support local models but they don't enforce
local-only operation. There's always a fallback,
always a telemetry call, always a condition where
your code leaves the machine without you noticing.

I wanted something stronger than a setting.
I wanted a guarantee.

So I built a physical-looking Kill Switch in the UI.
When you flip it, a middleware layer in
/api/chat/route.ts intercepts every request
before any provider routing happens:

const EXTERNAL_PROVIDERS = [
  "openai", "anthropic", "gemini",
  "groq", "deepseek", "together", "openrouter"
];

const isAirGapped = 
  req.headers.get("x-air-gap-mode") === "true";

if (isAirGapped && EXTERNAL_PROVIDERS.includes(provider)) {
  return NextResponse.json(
    { error: "AIR-GAP VIOLATION: External provider blocked." },
    { status: 403 }
  );
}

The enforcement is server-side.

This is the important detail. A client-side check
can be bypassed by anyone who modifies the request
directly. The server-side middleware means the 403
fires regardless of how the request was constructed.

The only provider that passes through is ollama —
which talks exclusively to localhost:11434.
Zero bytes leave the machine. Not as a claim.
As a code guarantee.

Reverse Engineer Mode

This one is simple conceptually but surprisingly
useful in practice.

One click swaps the entire system prompt for an
aggressive decompiler prompt. The standard safety
instructions are gone. What's left is a prompt
specifically designed to:

De-obfuscate minified JavaScript and reconstruct readable ASTs
Parse compiled WASM back into human-readable logic
Rename obfuscated variables based on inferred purpose
Explain malware behavior in plain English

No refusals. No disclaimers. No "I can't help with that."

An amber warning banner appears in the UI so you
always know which mode is active. I didn't want
this to be invisible.

Web3 Black-Hat Playground

I added this specifically for smart contract auditors.

One click hits /api/web3/scaffold which initializes
a full Hardhat project in your local rogue_workspace:

await execAsync("npm init -y", { cwd: targetDir });
await execAsync(
  "npm install --save-dev hardhat @nomicfoundation/hardhat-toolbox",
  { cwd: targetDir, timeout: 120000 }
);

It also writes a starter Target.sol with an
intentional reentrancy vulnerability so you have
something to audit immediately:

function withdraw(uint256 amount) public {
    require(balances[msg.sender] >= amount);
    // external call before state update ← reentrancy
    (bool success, ) = msg.sender.call{value: amount}("");
    require(success);
    balances[msg.sender] -= amount;
}

The Black-Hat Agent then generates PoC exploits
against your local contracts to verify your defenses
before you deploy anywhere real.

The route is idempotent. Call it twice, the second
call returns { status: "already_initialized" }
instead of re-running npm install.

Ghost Deploy — Tor .onion Generator

This was the most technically satisfying feature to ship.

The pipeline:
Detect tor binary (which tor)

Spawn http-server on a random open port

Write a torrc with HiddenServiceDir + HiddenServicePort

Start tor daemon with the custom torrc

Poll for hostname file (tor generates this async)

Read the cryptographic .onion address from the file

Stream bootstrap logs to the terminal

Return the .onion URL to the UI

text

The part that took me the longest to understand:
you don't assign a .onion address. Tor generates
one for you from an Ed25519 keypair the first time
it starts with a HiddenServiceDir. You discover
it by reading the hostname file after startup.

const hostnameFile = path.join(hiddenServiceDir, "hostname");
let onionAddress = "";

for (let i = 0; i < 30; i++) {
  await new Promise(r => setTimeout(r, 1000));
  try {
    onionAddress = 
      (await fs.readFile(hostnameFile, "utf-8")).trim();
    if (onionAddress) break;
  } catch {
    // not generated yet, keep polling
  }
}

Poll for up to 30 seconds. In practice it generates
in 3-8 seconds on a warm machine.

Requires: brew install tor on Mac,
sudo apt install tor on Linux.

The API key security fix I almost missed

The production test suite caught something I'd
overlooked: the original implementation accepted
API keys from the request body.

Fine for local use. Dangerous for any hosted deployment.

The fix uses a dual-source pattern — env vars take
priority, client-provided key is only trusted if the
request originates from localhost:

const host = req.headers.get("host") || "";
const isLocalhost = 
  host.startsWith("localhost") || 
  host.startsWith("127.0.0.1");

const ENV_KEYS: Record<string, string | undefined> = {
  openai:    process.env.OPENAI_API_KEY,
  anthropic: process.env.ANTHROPIC_API_KEY,
  gemini:    process.env.GOOGLE_API_KEY,
  groq:      process.env.GROQ_API_KEY,
};

const apiKey = ENV_KEYS[provider] || 
               (isLocalhost ? clientApiKey : undefined);

This keeps the local-first UX intact — self-hosters
can still paste their key into the UI — while being
safe for any cloud deployment.

What the codebase looks like

I spent real time on this.

TypeScript strict — 0 errors
ESLint — 0 warnings (37 errors when I started, took a full session to clean up)
Next.js 14 App Router throughout
MIT license — do whatever you want with it

The any types are gone. The let declarations
that should be const are fixed. The catch blocks
use unknown and narrow properly. It's the kind
of codebase I'd be comfortable handing to someone
else.

What's next and where you can help

I'm actively looking for contributors. The things
I haven't built yet:

Easy

Add SSRF and path traversal to the Red Team prompt
Dark/light theme toggle

Medium

Playwright E2E test coverage
Better real-time Tor bootstrap log streaming in UI
Multi-provider failover when Air-Gap is OFF

Hard

VS Code extension that ports the agent sidebar
Multi-file workspace swarm support
LSP integration for inline Red Team annotations

The codebase is clean, the architecture is documented
in the README, and PRs get reviewed fast.

If you've been waiting for an AI coding tool that
actually trusts you — this is it.

→ github.com/malgatyuvraj/Rogue-Studio

Drop a ⭐ if you think this kind of tooling
should exist. Issues and PRs are open.

If you have questions about the swarm orchestration,
the Air-Gap middleware, or the Tor deploy pipeline —
ask in the comments. I'll answer everything.

rougestudio.vercel.app