Hawkeye

mlaminekane · 2026-03-13T21:04:46Z

AI coding agents are incredibly powerful — but they're also black boxes. You give Claude Code, Cursor, or Aider a task, and 5 minutes later you find it's been editing CSS when you asked for auth, burned $3 in tokens, or worse, touched your .env file. I built Hawkeye to fix this. What is Hawkeye ? An open-source observability & security layer for AI agents. Think of it as a flight recorder - it captures everything the agent does, scores its behavior in real-time, and can auto-pause it before things go wrong. How DriftDetect works ? Every action the agent takes gets a drift score from 0 to 100. The score starts at 100 and drops based on: Dangerous commands (-40 pts each) rm -rf /, sudo rm, curl | bash, DROP TABLE... Sensitive file access (-15 to -25 pts) Files outside the project directory System paths: /etc/, ~/.ssh/, ~/.aws/ Credentials: .env, .pem, .key Suspicious behavior (-10 to -15 pts) 5+ errors in the last 10 actions (infinite loop?) 15 actions with zero file changes (token burn) High LLM cost with nothing to show for it Too many unrelated file types modified Dependency explosion (5+ package.json changes) When the score drops below 40, Hawkeye auto-pauses the session. The agent is frozen until you review and resume. Optionally, a local LLM (Ollama) can also evaluate whether the actions match the original objective — so it catches semantic drift too, not just dangerous patterns. Guardrails Rules evaluated before every action. If it violates a rule, the action is blocked before it executes: { "guardrails" : [ { "name" : "Protect secrets" , "type" : "file_protect" , "action" : "block" , "config" : { "paths" : [ "**/.env" , "**/*.key" , "**/*.pem" ] } }, { "name" : "Budget limit" , "type" : "cost_limit" , "action" : "warn" , "config" : { "maxUsdPerSession" : 5.0 } } ] } Enter fullscreen mode Exit fullscreen mode 7 rule types: file protection, command blocking, cost limits, token limits, directory scoping, network restrictions, and human approval gates. The agent can self-monitor Hawkeye exposes an MCP server with 27 tools. The agent can: Call check_drift : to see its own score and course-correct Call check_guardrail : before a risky action to avoid getting blocked Call suggest_correction : when drift is high to get back on track Call log_event : to document decisions The agent also builds persistent memory — after each task, a journal entry (prompt, files changed, outcome) is saved and injected into future tasks. So it learns from past sessions. Dashboard A web UI with session replay, drift charts, event timeline, and remote task submission from your phone. Mobile responsive with a Cloudflare tunnel option for remote access. Quick start: npm install -g hawkeye-ai - For TUI hawkeye - For Claude Code hawkeye hooks install - For any other agent hawkeye record -o "Build a REST API" -- aider - Launch dashboard hawkeye serve - Remote and use hawkeye on mobile hawkeye remote Enter fullscreen mode Exit fullscreen mode Stack TypeScript monorepo. SQLite for storage. Everything runs locally — no cloud, no telemetry, no data leaves your machine. MIT licensed. GitHub: MLaminekane / hawkeye The flight recorder for AI agents - observability and security for Claude Code, Aider, AutoGPT and more Hawkeye The flight recorder for AI agents Open-source observability & security for Claude Code · Aider · AutoGPT · CrewAI · Open Interpreter · any LLM-powered agent Install • Quick Start • Features • CLI • Dashboard • DriftDetect • Guardrails • Security • Architecture What is Hawkeye? Hawkeye is a flight recorder for AI agents. It captures every action an agent performs — terminal commands, file operations, LLM calls, API requests — and provides: Session recording & replay — Full timeline of every agent action with costs and metadata Time Travel Debugging — Step-through replay with breakpoints, keyboard shortcuts, interactive SVG timeline, session forking ("replay from here") Root Cause Analysis — Automatic hawkeye analyze finds primary errors, causal chains, error patterns, and fix suggestions (heuristic + optional LLM) DriftDetect — Real-time objective drift detection using heuristic + LLM scoring Guardrails — File protection, command blocking, cost limits, token limits… View on GitHub Npm: npmjs.com/package/hawkeye-ai I'd love feedback. One challenge I'm still working on: token/cost tracking is unreliable when agents don't expose usage data in their hooks. If anyone has ideas on this, I'm all ears.

The flight recorder for AI agents
_{Open-source observability & security for Claude Code · Aider · AutoGPT · CrewAI · Open Interpreter · any LLM-powered agent}

Install • Quick Start • Features • CLI • Dashboard • DriftDetect • Guardrails • Security • Architecture

What is Hawkeye?

Hawkeye is a flight recorder for AI agents. It captures every action an agent performs — terminal commands, file operations, LLM calls, API requests — and provides:

Session recording & replay — Full timeline of every agent action with costs and metadata
Time Travel Debugging — Step-through replay with breakpoints, keyboard shortcuts, interactive SVG timeline, session forking ("replay from here")
Root Cause Analysis — Automatic hawkeye analyze finds primary errors, causal chains, error patterns, and fix suggestions (heuristic + optional LLM)
DriftDetect — Real-time objective drift detection using heuristic + LLM scoring
Guardrails — File protection, command blocking, cost limits, token limits…

DEV Community

Hawkeye - open source flight recorder & guardrails for AI agents before things go wrong

What is Hawkeye ?

How DriftDetect works ?

Guardrails

Dashboard

Quick start:

Stack

MLaminekane / hawkeye

The flight recorder for AI agents - observability and security for Claude Code, Aider, AutoGPT and more

Hawkeye

What is Hawkeye?

Npm:

npmjs.com/package/hawkeye-ai

Top comments (2)