AI coding agents are incredibly powerful — but they're also black boxes. You give Claude Code, Cursor, or Aider a task, and 5 minutes later you find it's been editing CSS when you asked for auth, burned $3 in tokens, or worse, touched your .env file.
I built Hawkeye to fix this.
What is Hawkeye ?
An open-source observability & security layer for AI agents. Think of it as a flight recorder - it captures everything the agent does, scores its behavior in real-time, and can auto-pause it before things go wrong.
How DriftDetect works ?
Every action the agent takes gets a drift score from 0 to 100. The score starts at 100 and drops based on:
Dangerous commands (-40 pts each)
- rm -rf /, sudo rm, curl | bash, DROP TABLE...
Sensitive file access (-15 to -25 pts)
- Files outside the project directory
- System paths: /etc/, ~/.ssh/, ~/.aws/
- Credentials: .env, .pem, .key
Suspicious behavior (-10 to -15 pts)
- 5+ errors in the last 10 actions (infinite loop?)
- 15 actions with zero file changes (token burn)
- High LLM cost with nothing to show for it
- Too many unrelated file types modified
- Dependency explosion (5+ package.json changes)
When the score drops below 40, Hawkeye auto-pauses the session. The agent is frozen until you review and resume.
Optionally, a local LLM (Ollama) can also evaluate whether the actions match the original objective — so it catches semantic drift too, not just dangerous patterns.
Guardrails
Rules evaluated before every action. If it violates a rule, the action is blocked before it executes:
{
"guardrails": [
{
"name": "Protect secrets",
"type": "file_protect",
"action": "block",
"config": { "paths": ["**/.env", "**/*.key", "**/*.pem"] }
},
{
"name": "Budget limit",
"type": "cost_limit",
"action": "warn",
"config": { "maxUsdPerSession": 5.0 }
}
]
}
7 rule types: file protection, command blocking, cost limits, token limits, directory scoping, network restrictions, and human approval gates. The agent can self-monitor Hawkeye exposes an MCP server with 27 tools.
The agent can:
- Call check_drift : to see its own score and course-correct
- Call check_guardrail : before a risky action to avoid getting blocked
- Call suggest_correction : when drift is high to get back on track
- Call log_event : to document decisions
The agent also builds persistent memory — after each task, a journal entry (prompt, files changed, outcome) is saved and injected into future tasks. So it learns from past sessions.
Dashboard
A web UI with session replay, drift charts, event timeline, and remote task submission from your phone. Mobile responsive with a Cloudflare tunnel option for remote access.
Quick start:
npm install -g hawkeye-ai
- For TUI
hawkeye
- For Claude Code
hawkeye hooks install
- For any other agent
hawkeye record -o "Build a REST API" -- aider
- Launch dashboard
hawkeye serve
- Remote and use hawkeye on mobile
hawkeye remote
Stack
TypeScript monorepo. SQLite for storage. Everything runs locally — no cloud, no telemetry, no data leaves your machine. MIT licensed.
GitHub:
MLaminekane
/
hawkeye
The flight recorder for AI agents - observability and security for Claude Code, Aider, AutoGPT and more
Hawkeye
The flight recorder for AI agents
Open-source observability & security for Claude Code · Aider · AutoGPT · CrewAI · Open Interpreter · any LLM-powered agent
Install • Quick Start • Features • CLI • Dashboard • DriftDetect • Guardrails • Security • Architecture
What is Hawkeye?
Hawkeye is a flight recorder for AI agents. It captures every action an agent performs — terminal commands, file operations, LLM calls, API requests — and provides:
- Session recording & replay — Full timeline of every agent action with costs and metadata
- Time Travel Debugging — Step-through replay with breakpoints, keyboard shortcuts, interactive SVG timeline, session forking ("replay from here")
-
Root Cause Analysis — Automatic
hawkeye analyzefinds primary errors, causal chains, error patterns, and fix suggestions (heuristic + optional LLM) - DriftDetect — Real-time objective drift detection using heuristic + LLM scoring
- Guardrails — File protection, command blocking, cost limits, token limits…
Npm:
I'd love feedback. One challenge I'm still working on: token/cost tracking is unreliable when agents don't expose usage data in their hooks. If anyone has ideas on this, I'm all ears.
Top comments (1)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.