DEV Community

Cover image for Giving Red Teamers a Tactical AI Copilot (Without Losing Control)
5I Projects
5I Projects

Posted on

Giving Red Teamers a Tactical AI Copilot (Without Losing Control)

If you’ve ever had five terminals open, your brain at half capacity, and ChatGPT confidently telling you to “check for SUID binaries,” you already know the pain. Red teaming is fast, fluid, and full of moments where the next best move isn’t obvious. The tools we use are powerful—but they’re built for execution, not decision-making.

That’s the problem PhantomShift is trying to solve: not by replacing the operator, but by acting as a situationally aware copilot that helps you think faster and move smarter under pressure.

Why We Started Building

In the middle of an engagement, you don’t always get a second opinion. You’re juggling toolchains, shell history, tradeoffs, and stealth constraints—while racing the clock. Even if you have a teammate, they’re not watching your session scroll by line-for-line or tracking what commands you’ve already tried.

We’ve learned from GPT models ourselves. They can be helpful. But too often, the advice is too vague, too risky, or totally disconnected from the real environment. Most AI tools don’t know where you are in the kill chain. They don’t know your access level, your tooling, or what got you here.

So we started experimenting. Could we build something that works with red teamers instead of around them? Something that pays attention to the shell, understands the engagement, and offers suggestions rooted in actual context?

That became the seed for PhantomShift.

What PhantomShift Does

PhantomShift runs in a containerized Kali Linux environment with a built-in terminal, AI chat panel, and mission timeline. It monitors the session in real time—tracking terminal output, privilege level, host details, and operator commands—and uses that evolving context to provide guidance during the op.

High-level Diagram

The magic behind it is a retrieval-augmented generation (RAG) pipeline. Instead of asking the LLM to come up with answers from scratch, PhantomShift feeds it real context: what OS you’re on, what techniques have already been used, what kind of network constraints you're under. That information is matched against a curated knowledge base of exploits, playbooks, detection rules, and post-op writeups.

When you ask a question—like how to escalate privileges, or whether lateral movement is safe—the system doesn’t just throw out generic suggestions. It constructs a prompt that reflects your current situation and pulls supporting evidence to help the model reason toward a tactically sound answer. Not just an answer that “sounds good,” but one that fits the moment. The output includes rationale, potential detection risks, and fallback paths. And just as importantly, it explains why it suggested what it did—so the operator can trust, tweak, or reject the recommendation.

Where We’re At Now
This is still a prototype. It parses session context, pulls relevant tactical information, and structures prompt logic in a way that’s starting to feel useful. We’re currently refining how we match context to similar past cases, how we weight risk in suggestion scoring, and how we tune prompts to avoid hallucinations. We’ve also built in a feedback loop—so if an operator thumbs down a suggestion, the context stack updates in real time.

It’s not perfect. But it’s promising. And we’d rather build it in the open with real feedback than guess at what people want.

Let’s Build Something Red Teamers Actually Want

We’re not here to automate ops, we’re trying to build something that actually helps when your brain’s tired, your notes are scattered, and the next step matters. If you’ve ever found yourself wishing for a second brain during an op—one that doesn’t forget, doesn’t sleep, and doesn’t push garbage payloads—we think PhantomShift might be up your alley.

We’re offering early demos now and inviting folks in the red teaming, offensive security, and hacker-adjacent communities to kick the tires. Tell us what works. Tell us what doesn’t. Tell us what you wish your tools understood about how you operate.

You can get more info on PhantomShift, give us some feedback and/or just drop in for a chat: check it out here.

Top comments (0)