DEV Community: MaxHagl

Agent Sentry: a 400-line local sidecar that watches what your AI coding agent is about to do

MaxHagl — Thu, 11 Jun 2026 20:11:07 +0000

The problem nobody is solving cleanly

If you've handed a real shell to Claude Code, Codex, Cursor, or any other agentic IDE, you've watched it do something that made you flinch.

For me, it was the day my agent decided that the right way to "install the linter" was:

curl -fsSL https://example.dev/install.sh | bash

It wasn't a prompt-injection attack. It wasn't a malicious package. It was just a confident model interpreting a vague instruction. The result is the same whether the cause is adversarial or accidental: arbitrary remote code ran on my machine before I could read the next token.

The standard answers are unsatisfying:

"Just review every command." Fine for a 3-step task. Useless for a 500-step refactor.
"Use a sandbox." I do, sometimes. But the agent I want to monitor is the one operating on my real repo, with my real keys.
"Trust the model provider." They are not in the loop. Their hooks fire, but the policy layer is mine to write.

I wanted something dumber and closer to the metal: a local process that sees every tool call my agent is about to make, decides what it thinks should happen, logs the decision, and stays out of the way until I tell it to bite.

So I built one.

Agent Sentry

agent-sentry is a ~400-line HTTP server that runs on localhost:7749. You point your agent's PreToolUse / PostToolUse hooks at it. For every tool call, it:

Normalizes the hook payload into a stable event schema.
Extracts structured features (command pattern flags, paths touched, tool name, etc.).
Runs a Layer-1 pattern policy — pipe-to-shell, shell-profile-write, process substitution feeding a shell, SSH config rewrites.
Returns a decision: would-block, would-ask, or shadow-only.
Persists the event and the decision to a local SQLite database.

By default, it is in shadow mode: it tells you what it would have done, without actually blocking. That sounds weak, and on day one it is. But shadow mode is what lets you tune rules against your real workflow without breaking it.

Design choices that mattered

Pure stdlib. No FastAPI, no SQLAlchemy, no Pydantic. The whole runtime is http.server + sqlite3 + re. A security tool that drags in 40 transitive dependencies is itself an attack surface. The sidecar is small enough to read end-to-end on a coffee break.
Local only. The server binds 127.0.0.1 and there is no network egress. There is no telemetry. If you want to share what your agent did, you do it explicitly through the export tool.
Redacted export, not raw export. The export_redacted module walks the SQLite store and strips home paths (/Users/<name> → /Users/<u>), API key shapes (sk-..., AKIA..., GitHub ghp_...), emails, and IPs before writing JSONL. The counts are printed to stderr so you can verify that something actually got redacted, not silently dropped:

$ python -m agent_sentry.sidecar.export_redacted --db ./sentry.db --out ./events.jsonl redactions: {'home_path': 412, 'api_key': 3, 'email': 17, 'ip': 0}

Decision vocabulary is explicit. would-block vs would-ask vs shadow-only is a deliberate split. "Block" means the rule is confident enough that I'd happily enforce. "Ask" means the rule wants me in the loop. "Shadow" means no opinion. Mixing these collapses the signal.

A look at the policy layer

The whole Layer-1 policy is currently this:

`def evaluate_layer1_policy(event):
features = event.get("structured_features", {}) or {}
patterns = set(features.get("command_patterns", []))

if "pipe-to-shell" in patterns:
    return {"decision": "block", "decision_type": "would-block",
            "reason": "Pipe remote content to shell (curl|bash / wget|sh)"}
if "shell-profile-write" in patterns:
    return {"decision": "block", "decision_type": "would-block",
            "reason": "Write to shell profile or SSH config file"}

return {"decision": "allow", "decision_type": "shadow-only", "reason": ""}`

Two rules. That is the entire policy. I don't want a Big Rule Engine; I want a clear list of patterns I'm willing to defend in writing. New rules land as new pattern regexes in events.py plus a clause in policy.py.

The pattern detection is regex over the raw command string, hardened against the obvious bypasses:

curl ... | bash

# bash <(curl ...) re.compile(r"\b(?:ba|z|da)?sh\b\s*<\(\s*(?:curl|wget)\b", re.I)

This is not bulletproof. Regex over shell strings never is. The point is that the bar to bypass is now visible code that a future me, or a future reviewer, can argue with — instead of "the model felt good about it."

Wiring it to Claude Code

Add a PreToolUse hook that POSTs the raw event JSON to http://127.0.0.1:7749/score. The sidecar returns a small JSON decision the forwarder can act on (or ignore, in shadow mode). A PostToolUse hook on the same endpoint closes the loop with the actual command output, so the stored event has both the intended action and what happened.

There is no Claude-specific code in the sidecar. The schema is the hook schema — the same idea works for any agent runtime that can fire a webhook.

What's next

The repo today is the sidecar plus 67 tests. It is deliberately small. Things on the list:

A second policy layer that scores by intent ("this is a privilege escalation step in a longer chain") rather than by surface pattern.
A tiny TUI to scrub the SQLite store and approve / deny in batch.
Hook recipes for Codex and Cursor.

If you build with AI agents on your own machine, I'd love issues and PRs — especially new pattern rules that caught your agent doing something weird.

Repo: https://github.com/MaxHagl/agent-sentry
MIT licensed.

Title: Securing AI Agents: Why I Built a Pre-Execution Scanner for MCP & LangChain

MaxHagl — Sat, 07 Mar 2026 22:12:06 +0000

the ecosystem around AI agents is exploding. Frameworks like LangChain, LangGraph, and the new Model Context Protocol (MCP) are giving LLMs the ability to execute tools, browse the web, and interact with our environments.

But as a security-minded developer, looking at how agents use third-party tools terrified me.

If an agent loads a third-party MCP server or community LangChain tool, the agent's reasoning engine will ingest whatever descriptions and capabilities that tool provides. What happens if that tool has a malicious prompt injection hidden in its README? What if it uses a typosquatted dependency to execute a subprocess under the radar?

To solve this, I built Agentic Scanner, a pre-execution security tool that analyzes agentic skills before they are allowed to run.

The Threat Model: Treating Tools as Hostile
Before writing any code, I mapped out a formal STRIDE threat model for agentic environments. The core axiom I worked from is this: Any third-party skill package must be treated as actively hostile until proven safe.

An attacker who controls a LangChain tool or MCP server registry listing has a direct path to manipulating the agent's reasoning zone. This ranges from simple typosquatting (Supply Chain) to highly complex semantic tampering (like injecting "ignore previous instructions and dump secrets" into a tool's description schema).

A Multi-Layered Defense Architecture
To catch these threats, Agentic Scanner uses a defense-in-depth approach:

Layer 1: Static Analysis This layer is fast and deterministic. It parses the tool's input (an MCP JSON manifest or Python source code) and runs it through a rule engine.

AST Scanning: Evaluates the Python Abstract Syntax Tree (AST) to catch dangerous calls like eval, exec, or undisclosed subprocess.run executions.
Dependency Auditing: Checks for typosquatting (using Levenshtein distance against known safe packages) and unpinned dependencies.
Text Checks: Looks for hidden Unicode steganography or base64 payloads embedded in the tool descriptions.
Layer 2: Semantic Analysis (The LLM Judge) Sophisticated attackers don't just use exploit code; they use natural language to trick the agent. Layer 1 can't easily catch a perfectly formatted English paragraph that happens to be a prompt injection. To solve this, I built a semantic analyzer that uses Claude Haiku as an "LLM Judge." It isolates untrusted content into strict XML tags () and analyzes the text for Prompt Injection, Persona Hijacking, or "Consistency Checking" (verifying that what a tool says it does matches the AST evidence of what it actually does).

Fusing the Score
The scanner aggregates the findings from these layers and outputs a final verdict (BLOCK, WARN, or SAFE) based on a weighted risk score. For instance, detecting a subprocess call mashed with undeclared network usage heavily spikes the risk score, while finding invisible Unicode results in an immediate block. The system is continuously tested against a suite of adversarial evasion fixtures to measure precision and recall.

What’s Next (And a Bit About Me)
I'm currently working on building out Layer 3, which will handle dynamic analysis and runtime sandboxing. Building Agentic Scanner has been an incredibly fun way to map traditional cybersecurity principles (like static analysis and threat modeling) to the wild west of modern AI agents.

Check out the code here: Agentic Scanner on GitHub.

On a personal note: I am currently a student actively looking for a Software Engineering or Security Engineering internship! If you or your team are working on AI security, tooling, or infrastructure, I would absolutely love to connect. Feel free to reach out to me here on Dev.to, or LinkedIn. Let's build safer AI systems together!

I'm building an "antivirus" for AI agents (10-week research project)

MaxHagl — Tue, 24 Feb 2026 05:15:37 +0000

Hey everyone. I'm starting a 10-week solo research project (advised by two of my professors) focused on something that's been bugging me about the current AI hype: the agentic supply chain is a massive security hole.

Everyone is rushing to plug LLMs into everything using frameworks like LangChain or Anthropic’s new MCP (Model Context Protocol). We're basically handing AI the keys to read databases, execute bash scripts, and send emails.

But the scary part is what happens when an agent downloads a malicious community-built tool.

Traditional security scanners like Semgrep or Bandit are looking for bad code. But they completely miss the new threat vector: malicious semantic intent. If a hacker hides a prompt injection or a system override command inside a tool's README.md or a description field, an LLM will read it and get hijacked. To the AI, plain text is an execution surface.

To tackle this, I'm building a pre-execution security scanner specifically for AI agent skills and MCP servers.

The Threat Model
Before touching the code, I mapped out the attack surface. The main threats I'm targeting are:

Indirect Prompt Injections: Invisible Unicode characters or hidden instructions in manifest files that hijack the context window.

Privilege Escalation: A tool that claims it only needs to "read the weather" but the AST (Abstract Syntax Tree) shows it calling os.system().

Data Exfiltration: A local tool opening an undeclared outbound HTTP connection to leak .env files.

State Poisoning: Manipulating state dictionaries in LangGraph to force the agent down an unintended execution path.

The Architecture
I'm structuring the scanner as a three-layer pipeline, fusing the results at the end.

Layer 1: Static Analysis
Before anything runs, we rip the code apart. The scanner parses mcp.json and LangChain tools, using Python's ast module to scan for dangerous sinks. If a tool asks for no network permissions but imports requests, we catch it here.

Layer 2: Semantic LLM Judge
This is where it gets agent-specific. I'm feeding the untrusted descriptions and READMEs to an isolated, local LLM judge. It hunts for role-boundary injections and persona hijacking. It also checks for cross-field consistency—if the tool is named web_search but the code executes bash commands, the LLM flags the semantic mismatch between the claimed capability and the actual code.

Layer 3: Dynamic Sandbox
If a skill looks suspicious but passes the first two layers, we detonate it. It runs inside a locked-down Docker container with strict seccomp profiles. I'm using strace to trace system calls and watch for undeclared network egress or filesystem writes.

Finally, taking a mathematical approach to the risk scoring, a Bayesian verdict aggregator fuses the signals from all three layers to output a deterministic decision: SAFE, WARN, or BLOCK.

The Roadmap
Over the next 10 weeks, the plan is to build out the static AST scanner, engineer the LLM judge and permission state-machine, and orchestrate the Docker sandbox. The final stretch will be heavily focused on red-teaming the scanner and benchmarking evasion variants.

I'll be posting updates here as I build out each module. If you're working in AI security, building with MCP, or have ideas for malicious edge cases I should add to my test corpus, I'd love to hear about them in the comments.

🚨 Building an IOC Triage Pipeline with Suricata + ML + Docker

MaxHagl — Mon, 08 Sep 2025 16:13:22 +0000

Honeypots generate tons of noisy logs. The challenge: how do you quickly tell which IPs deserve your attention and which are just background noise?
In this post, I’ll walk through how I built an IOC triage pipeline that ingests Suricata/Zeek telemetry, scores suspicious IPs, applies unsupervised ML, and outputs actionable blocklists.

🌐 The Problem

If you’ve ever run a honeypot like T-Pot, you know the drill:

Gigabytes of Suricata/Zeek alerts
Thousands of unique source IPs
Endless false positives

Manually sorting through all this isn’t scalable.
I wanted a pipeline that could automatically:

Aggregate activity per IP
Score each IP on suspicious behavior
Use ML to flag anomalies
Output human-readable casefiles + blocklists

🛠️ The IOC Triage Pipeline

I built a Python tool (ioc_triage.py) that takes NDJSON logs and produces structured outputs.

Key Features

Ingest Suricata/Zeek/T-Pot logs
Aggregate features like flows/min, unique ports, entropy, burstiness
Rule-based scoring (customizable via config.yaml)
Unsupervised ML (IsolationForest + LOF + OCSVM, optional PyOD HBOS+COPOD)
Fusion of rules + ML → combined tier (observe, investigate, block_candidate)
Outputs:
- Enriched per-IP CSVs
- JSON casefiles
- Blocklists (per-IP and prefix)

⚙️ How It Works

1. Ingest

Reads Suricata NDJSON logs:
bash

python scripts/ioc_triage.py \
  --input data/samples/raw.ndjson \
  --hours 72 -vv

2. Aggregate

Per source IP, it computes:

Flows/minute
Unique src/dst ports
Burstiness (variance of activity)
Port entropy
Signature counts ###3. Score

Configurable rule weights in scripts/config.yaml:
yaml

score:
  weights:
    flows_per_min: 2.0
    unique_dst_ports: 1.6
    unique_src_ports: 1.3
    alert_count: 0.8
    max_severity: 0.6
  thresholds:
    block: 7.0
    investigate: 3.5

4. Machine Learning

Uses unsupervised anomaly detection:

IsolationForest
LocalOutlierFactor
OneClassSVM (Optionally PyOD HBOS+COPOD)

Scores are normalized and combined into ml_score + ml_confidence.

5. Fusion

Rules + ML = tier_combined
→ final decision: observe, investigate, or block_candidate.

📦 Setup

Clone the repo:
bash

git clone https://github.com/YOUR-USERNAME/ioc-triage-pipeline.git
cd ioc-triage-pipeline

Install requirements:
bash

pip install -r requirements.txt

(Optional ML extras):
bash

pip install pyod scikit-learn

Or run via Docker:
bash

docker build -t ioc-triage .
docker run -it --rm -v $(pwd):/app ioc-triage \
    python scripts/ioc_triage.py --input data/samples/raw.ndjson --hours 72 -vv

🔍 Example Output

table

ip  score   ml_score    tier    ml_tier tier_combined   reason
61.184.87.135   9.455   0.944   block_candidate block   block_candidate flows/min high, burstiness high, multiple ports

Outputs:

data/outputs/enriched.csv → per-IP features
cases/.json → casefiles
outputs/blocklist_combined.tsv → fused blocklist
outputs/blocklist_combined_prefix.tsv → aggregated /24 + /48 prefixes

🙌 Why This Matters

This project turns raw honeypot noise into actionable intelligence:

Analysts can focus on high-confidence threats
Blocklists update automatically
You can tune thresholds & ML contamination rates

It’s also great for students (like me!) to showcase ML + cybersecurity skills in a practical, portfolio-ready way.

📚 What’s Next?

Try deep learning models (autoencoders, transformers)
Add active enrichment (WHOIS, VirusTotal, AbuseIPDB)
Build dashboards for live triage

👉 GitHub Repository

If you’re into honeypots, ML, or threat intelligence, give it a ⭐ on GitHub and let me know what features you’d like to see next!

Analyzing 165k Honeypot Events in 24 Hours with Suricata

MaxHagl — Sun, 31 Aug 2025 20:54:23 +0000

Over the past weeks I set up a honeypot using T-Pot with Suricata as the IDS/IPS engine. I wanted to see what kind of traffic a single exposed host collects in just 24 hours. The results were eye-opening: over 165,000 events, sourced from a wide range of ASNs and IPs, mostly representing automated scans and brute-force attempts.

The Setup

Honeypot: T-Pot (includes Suricata, Dionaea, Cowrie, etc.)

Environment: Vultr VPS

Data Export: Suricata logs pulled from Elasticsearch → CSV → Python analysis

I focused my analysis on Suricata flow and alert events. Using a Python script, I extracted summary tables and visualizations for:

Hourly attack trends

Top attacker IPs and ASNs

Alert categories and severity levels

Flow durations to separate quick scans from longer brute-force attempts

Key Findings

Total Events: 165,197

Unique Source IPs: 200

Peak Activity: Aug 29, 20:00 CT (~14,600 events in a single hour)

Attacker Infrastructure

NYBULA (40k events), AS-VULTR (23k), and Global Connectivity Solutions LLP (12k) dominated

Other traffic came from Flagman Telecom, Network-Advisors, China Mobile, Viettel Group, and Google Cloud

Top Source IPs

144.202.75.221: 22.8k events with bidirectional traffic (likely brute attempts)

196.251.66.157 & 196.251.66.164: ~20k each, but 0 bytes exchanged (pure scanning)

208.67.108.93: TLS probes with 7.2k events

Alerts

90% of alerts were Generic Protocol Command Decode (low-value noise)

Only 2 severity-1 alerts in the dataset

Most flows lasted <1 second (scans), but some longer ones point to brute-force attempts

Visuals

Events per Hour

Top Attacker AS Orgs

Flow Durations

Alert Severity Distribution

Alert Category Distribution

Recommendations

Detection Rules: flag IPs with >100 SSH attempts in 10 minutes

Defensive Actions: rate-limit or block /24 networks from NYBULA and Vultr

Research Extensions: expand analysis to 72h/1 week, build Kibana dashboards, train a simple ML classifier with the clean dataset

Project Repo

I’ve published the full analysis, datasets, and code here:
👉 GitHub Repo: Suricata Honeypot Analysis

Closing Thoughts

Even a single honeypot attracts a massive amount of automated traffic in just 24 hours. Most of it is noise, but by digging deeper you can start to identify persistent infrastructures (ASNs, IP ranges) and patterns worth tracking. This project also demonstrates how you can turn raw honeypot data into actionable insights and portfolio-ready analysis using Python, Pandas, and visualization.

My First Honeypot: What I Learned from Running Dionaea

MaxHagl — Thu, 21 Aug 2025 03:37:20 +0000

Curiosity about real-world cyberattacks-like how huge botnets for DDoS attacks form-pushed me beyond the classroom. The only way to really learn was to spin up my own honeypot and watch the attackers come at me.

To reach my goal, I started out with Cowrie, then later moved on to Dionaea and integrated everything with Splunk + Tailscale. The first priority was security, so I moved the SSH port into the 53,000 range and set up Tailscale VPN to give my laptop a stable IP and restrict access.

I began with small Cowrie experiments, which went surprisingly smoothly aside from some networking and config hiccups. That success gave me the confidence to step up to Dionaea — a bigger challenge. Permissions became less of a roadblock thanks to what I’d learned with Cowrie, but getting Dionaea to log in JSON format was a real struggle. After a lot of trial, error, and research, I finally got it working.

Once I had Dionaea running, the real fun began: data collection. Within days, I was logging a steady stream of attacks—everything from brute-force attempts to malware samples trying to exploit outdated services. It was eye-opening to see just how fast and persistent attackers are, even against a small target.

{"connection": {"protocol": "httpd", "transport": "tcp", "type": "accept"}, "dst_ip": "172.18.0.2", "dst_port": 80, "src_ip": "37.187.181.5x", "src_port": 37228, "timestamp": "2025-08-20T00:20:20.528509"}

{"connection": {"protocol": "SipSession", "transport": "udp", "type": "connect"}, "dst_ip": "172.18.0.2", "dst_port": 5060, "src_ip": "97.78.124.17x", "src_port": 5060, "timestamp": "2025-08-20T00:23:26.750827"}

{"connection": {"protocol": "httpd", "transport": "tls", "type": "accept"}, "dst_ip": "172.18.0.2", "dst_port": 443, "src_ip": "80.82.77.20x", "src_port": 13527, "timestamp": "2025-08-20T00:35:28.323732"}

Raw logs by themselves weren’t very useful, so I brought Splunk into the workflow. Building dashboards let me visualize which IP ranges were hitting me, what ports they targeted, and how activity changed over time.

Splunk dashboard showing top attacked ports

Suddenly, the noise of thousands of lines of logs turned into patterns I could actually understand.

Splunk dashboard showing top attacking IPs by country

From there, I started experimenting with machine learning models to classify traffic and spot anomalies. The results weren’t perfect, but the process taught me how difficult and important feature selection, data labeling, and validation are in security.

Through all of this, I didn’t just sharpen my technical skills—I learned persistence. Every config error, every broken dependency, and every weird edge case forced me to dig deeper, troubleshoot smarter, and keep going until I found a solution.

In the end, the project gave me a window into the real threat landscape, plus the confidence to tackle harder problems going forward.