AI coding agents are incredibly powerful — but they're also black boxes. You give Claude Code, Cursor, or Aider a task, and 5 minutes later you find it's been editing CSS when you asked for auth, burned $3 in tokens, or worse, touched your .env file.
I built Hawkeye to fix this.
What is Hawkeye ?
An open-source observability & security layer for AI agents. Think of it as a flight recorder - it captures everything the agent does, scores its behavior in real-time, and can auto-pause it before things go wrong.
How DriftDetect works ?
Every action the agent takes gets a drift score from 0 to 100. The score starts at 100 and drops based on:
Dangerous commands (-40 pts each)
- rm -rf /, sudo rm, curl | bash, DROP TABLE...
Sensitive file access (-15 to -25 pts)
- Files outside the project directory
- System paths: /etc/, ~/.ssh/, ~/.aws/
- Credentials: .env, .pem, .key
Suspicious behavior (-10 to -15 pts)
- 5+ errors in the last 10 actions (infinite loop?)
- 15 actions with zero file changes (token burn)
- High LLM cost with nothing to show for it
- Too many unrelated file types modified
- Dependency explosion (5+ package.json changes)
When the score drops below 40, Hawkeye auto-pauses the session. The agent is frozen until you review and resume.
Optionally, a local LLM (Ollama) can also evaluate whether the actions match the original objective — so it catches semantic drift too, not just dangerous patterns.
Guardrails
Rules evaluated before every action. If it violates a rule, the action is blocked before it executes:
{
"guardrails": [
{
"name": "Protect secrets",
"type": "file_protect",
"action": "block",
"config": { "paths": ["**/.env", "**/*.key", "**/*.pem"] }
},
{
"name": "Budget limit",
"type": "cost_limit",
"action": "warn",
"config": { "maxUsdPerSession": 5.0 }
}
]
}
7 rule types: file protection, command blocking, cost limits, token limits, directory scoping, network restrictions, and human approval gates. The agent can self-monitor Hawkeye exposes an MCP server with 27 tools.
The agent can:
- Call check_drift : to see its own score and course-correct
- Call check_guardrail : before a risky action to avoid getting blocked
- Call suggest_correction : when drift is high to get back on track
- Call log_event : to document decisions
The agent also builds persistent memory — after each task, a journal entry (prompt, files changed, outcome) is saved and injected into future tasks. So it learns from past sessions.
Dashboard
A web UI with session replay, drift charts, event timeline, and remote task submission from your phone. Mobile responsive with a Cloudflare tunnel option for remote access.
Quick start:
npm install -g hawkeye-ai
- For TUI
hawkeye
- For Claude Code
hawkeye hooks install
- For any other agent
hawkeye record -o "Build a REST API" -- aider
- Launch dashboard
hawkeye serve
- Remote and use hawkeye on mobile
hawkeye remote
Stack
TypeScript monorepo. SQLite for storage. Everything runs locally — no cloud, no telemetry, no data leaves your machine. MIT licensed.
GitHub:
MLaminekane
/
hawkeye
The flight recorder for AI agents - observability and security for Claude Code, Aider, AutoGPT and more
Hawkeye
The flight recorder for AI agents
Open-source observability & security for Claude Code · Aider · AutoGPT · CrewAI · Open Interpreter · any LLM-powered agent
Install • Quick Start • Features • CLI • Dashboard • DriftDetect • Guardrails • Security • Architecture
What is Hawkeye?
Hawkeye is a flight recorder for AI agents. It captures every action an agent performs — terminal commands, file operations, LLM calls, API requests — and provides:
- Session recording & replay — Full timeline of every agent action with costs and metadata
- DriftDetect — Real-time objective drift detection using heuristic + LLM scoring
- Guardrails — File protection, command blocking, cost limits, token limits, directory scoping
- Visual dashboard — Mobile-responsive web UI with session explorer, drift charts, and settings management
- Remote tasks — Submit prompts from your phone via dashboard, with image attachments, auto-approve, and persistent agent memory
- Interactive TUI — Terminal-responsive CLI…
Npm:
I'd love feedback. One challenge I'm still working on: token/cost tracking is unreliable when agents don't expose usage data in their hooks. If anyone has ideas on this, I'm all ears.
Top comments (0)