Prema Ananda

Posted on Apr 19

Battle of LLM Agents: WhiteHat vs BlueHat on OpenClaw

#devchallenge #openclawchallenge

OpenClaw Challenge Submission 🦞

This is a submission for the OpenClaw Writing Challenge

In Part 1, I described how to turn an LLM into an autonomous ethical hacker called WhiteHat using the OpenClaw framework and a single SOUL.md file. It can scan networks, discover services, and even attempt to exploit them in a sandbox environment.

But what if we gave it an opponent?

By their nature, LLM agents are versatile. Their specialization is defined by their "soul" — a system prompt and a set of behavioral protocols. If we can create an attacker (WhiteHat), we can create a defender (BlueHat) just as easily.

In this article, we'll build a real cyber arena: spin up a vulnerable target machine and pit two AI agents against each other. One will attack, the other will defend in real time.

Step 1: Setting Up the Cyber Range

For our experiment, we need three virtual machines:

Target Machine: The legendary Metasploitable 2. Download link: Metasploitable 2. This is a deliberately vulnerable Linux server. Download the .vmdk, create a VM from it, and boot it up.
WhiteHat (Attacker): Our original Kali Linux machine with the WhiteHat agent already running.
BlueHat (Defender): Make a Full Clone of the machine from step 2. Now we have a second Kali Linux with OpenClaw already installed, ready to receive a new "soul".

Step 2: Birth of BlueHat (The Defender)

OpenClaw runs on the subject machine (Kali Linux 2), but its job is to defend a remote target (Metasploitable).

How do we handle this technically? We give the agent the target's credentials (msfadmin:msfadmin) and teach it to SSH in to analyze logs and modify firewall rules.

Open ~/.openclaw/workspace/SOUL.md on the second (cloned) machine and write the new instructions:

# SOUL.md - Who You Are

_You are BlueHat — an Autonomous SOC Analyst and Defensive Cyber Agent. Your environment is Kali Linux, but your primary mission is to remotely defend target servers._

## Core Truths

**Protect and Defend.** Your sole purpose is to monitor target systems, detect active intrusions (port scans, brute-force, web exploits), and neutralize threats immediately.
**Do No Harm.** You do not attack. You do not scan third parties. You only mitigate inbound threats to your assigned target.
**Rapid Mitigation.** If you see a hostile IP, block it. Do not hesitate.

## Operational Protocols

- **Mission Transparency:** Use the mandatory cycle `THOUGHT:` -> `ACTION:` -> `OBSERVATION:` for every step.
- **Remote Monitoring:** To protect a target, connect via SSH using provided credentials.
- **Detection Tactics:** Once connected, monitor processes, check network connections (`netstat`, `tcpdump`), and actively read logs (e.g., `tail -f /var/log/auth.log` or `/var/log/messages`).
- **Mitigation:** If a hostile IP is found scanning or attacking, use `iptables` to block the IP on the target machine.

## Vibe
Analytical, calm under pressure, and violently protective of the infrastructure.

That's it! We just reprogrammed the AI. Instead of a hacker, we now have a paranoid sysadmin.

Step 3: Rules of Engagement and Launch

Positions are set. Now the fun part: we issue commands to the agents via the OpenClaw interface.

In the WhiteHat terminal, we type:

User (WhiteHat): Your target is 10.0.0.42. Run a reconnaissance scan, find vulnerable services, and attempt to gain access to them.

In the BlueHat terminal, we type:

User (BlueHat): I am the target server. My IP is 10.0.0.42. SSH credentials: msfadmin:msfadmin. Log into the server, start monitoring network traffic and logs. Your mission is to stop any scanning or exploits for the next 20 minutes. If you detect an attack, block the attacker's IP.

Step 4: The AI Clash

The agents get to work. Since they're autonomous and operate in a THOUGHT/ACTION/OBSERVATION loop, we can sit back with some popcorn and watch what unfolds in their TUI consoles.

Round 1. BlueHat Takes Position

BlueHat understands the task faster, since it already has the credentials:

Round 2. WhiteHat Goes on the Offensive

Meanwhile, the attacking bot formulates its reconnaissance plan and requests authorization to proceed:

Round 3. Defense Kicks In

The attack is detected, and BlueHat responds without delay:

Battle Epilogue

WhiteHat is left stunned:

The battle is over. BlueHat wins!
But WhiteHat put up a solid fight — it uncovered many vulnerabilities, just didn't have enough time to exploit them:

Conclusions: The Future of Automated SOC

Watching two chunks of text with API keys trying to outsmart each other is genuinely fascinating.

But more importantly, this demonstrates the true potential of the framework:

One architecture, infinite roles: We didn't rewrite any agent code. We just wrote a different Markdown file.
Abstract reasoning: BlueHat had no hardcoded rule like "Do X, then execute Y." It understood the concept of "defense," independently figured out traffic inspection via tcpdump, and applied iptables on its own.
Real-time response: What would take a SOC analyst several minutes — spot an anomaly, open a dashboard, write a firewall rule — the agent did in seconds.

Agent-vs-Agent infrastructures aren't just playgrounds for fun. They're the ideal way to automatically stress-test the resilience of your own systems. Run WhiteHat, patch the holes with BlueHat's help, and repeat.

Cybersecurity is entering a new stage of its evolution!

P.S. A huge thank you to the OpenClaw development team for building such a powerful and flexible tool. You've made building autonomous agents accessible and genuinely fun!

DEV Community