DEV Community

mlaminekane
mlaminekane

Posted on

Hawkeye - open source flight recorder & guardrails for AI agents before things go wrong

AI coding agents are incredibly powerful — but they're also black boxes. You give Claude Code, Cursor, or Aider a task, and 5 minutes later you find it's been editing CSS when you asked for auth, burned $3 in tokens, or worse, touched your .env file.

I built Hawkeye to fix this.

What is Hawkeye ?

An open-source observability & security layer for AI agents. Think of it as a flight recorder - it captures everything the agent does, scores its behavior in real-time, and can auto-pause it before things go wrong.

How DriftDetect works ?

Every action the agent takes gets a drift score from 0 to 100. The score starts at 100 and drops based on:

Dangerous commands (-40 pts each)

  • rm -rf /, sudo rm, curl | bash, DROP TABLE...

Sensitive file access (-15 to -25 pts)

  • Files outside the project directory
  • System paths: /etc/, ~/.ssh/, ~/.aws/
  • Credentials: .env, .pem, .key

Suspicious behavior (-10 to -15 pts)

  • 5+ errors in the last 10 actions (infinite loop?)
  • 15 actions with zero file changes (token burn)
  • High LLM cost with nothing to show for it
  • Too many unrelated file types modified
  • Dependency explosion (5+ package.json changes)

When the score drops below 40, Hawkeye auto-pauses the session. The agent is frozen until you review and resume.

Optionally, a local LLM (Ollama) can also evaluate whether the actions match the original objective — so it catches semantic drift too, not just dangerous patterns.

Guardrails

Rules evaluated before every action. If it violates a rule, the action is blocked before it executes:

  {
    "guardrails": [
      {
        "name": "Protect secrets",
        "type": "file_protect",
        "action": "block",
        "config": { "paths": ["**/.env", "**/*.key", "**/*.pem"] }
      },
      {
        "name": "Budget limit",
        "type": "cost_limit",
        "action": "warn",
        "config": { "maxUsdPerSession": 5.0 }
      }
    ]
  }
Enter fullscreen mode Exit fullscreen mode

7 rule types: file protection, command blocking, cost limits, token limits, directory scoping, network restrictions, and human approval gates. The agent can self-monitor Hawkeye exposes an MCP server with 27 tools.

The agent can:

  • Call check_drift : to see its own score and course-correct
  • Call check_guardrail : before a risky action to avoid getting blocked
  • Call suggest_correction : when drift is high to get back on track
  • Call log_event : to document decisions

The agent also builds persistent memory — after each task, a journal entry (prompt, files changed, outcome) is saved and injected into future tasks. So it learns from past sessions.

Dashboard

A web UI with session replay, drift charts, event timeline, and remote task submission from your phone. Mobile responsive with a Cloudflare tunnel option for remote access.

Quick start:

  npm install -g hawkeye-ai

  - For TUI
  hawkeye

  - For Claude Code
  hawkeye hooks install

  - For any other agent
  hawkeye record -o "Build a REST API" -- aider

  - Launch dashboard
  hawkeye serve

  - Remote and use hawkeye on mobile
  hawkeye remote 
Enter fullscreen mode Exit fullscreen mode

Stack

TypeScript monorepo. SQLite for storage. Everything runs locally — no cloud, no telemetry, no data leaves your machine. MIT licensed.

GitHub:

GitHub logo MLaminekane / hawkeye

The flight recorder for AI agents - observability and security for Claude Code, Aider, AutoGPT and more

Hawkeye

The flight recorder for AI agents
Open-source observability & security for Claude Code · Aider · AutoGPT · CrewAI · Open Interpreter · any LLM-powered agent

npm version license GitHub stars

InstallQuick StartFeaturesCLIDashboardDriftDetectGuardrailsSecurityArchitecture


What is Hawkeye?

Hawkeye is a flight recorder for AI agents. It captures every action an agent performs — terminal commands, file operations, LLM calls, API requests — and provides:

  • Session recording & replay — Full timeline of every agent action with costs and metadata
  • DriftDetect — Real-time objective drift detection using heuristic + LLM scoring
  • Guardrails — File protection, command blocking, cost limits, token limits, directory scoping
  • Visual dashboard — Mobile-responsive web UI with session explorer, drift charts, and settings management
  • Remote tasks — Submit prompts from your phone via dashboard, with image attachments, auto-approve, and persistent agent memory
  • Interactive TUI — Terminal-responsive CLI…




Npm:


I'd love feedback. One challenge I'm still working on: token/cost tracking is unreliable when agents don't expose usage data in their hooks. If anyone has ideas on this, I'm all ears.

Top comments (0)