DEV Community: mlaminekane

Hawkeye update: multi-agent orchestration, remote tasks, and local model support

mlaminekane — Mon, 30 Mar 2026 04:16:02 +0000

A few weeks ago, I posted Hawkeye here for the first time.

At the time, the core idea was simple:

Hawkeye is a flight recorder for AI agents.

It records what an agent does in a repo, helps detect drift, adds guardrails, and gives you a dashboard to inspect what happened when a run goes wrong.

Since then, I shipped a much bigger update, and Hawkeye feels less like a passive recorder now and more like an operational layer for agent workflows.

What Hawkeye is trying to solve

Once agents start doing real work in a repository, basic questions become surprisingly hard to answer:

What exactly did the agent do?
When did it start drifting?
Why did it fail?
What files did it touch?
How much did that run cost?
Was this run better or worse than the last one?

That is the problem Hawkeye is built for.

It is still local-first, still SQLite-backed, and still focused on making agent work inspectable instead of magical.

What changed since the first post

1. Multi-agent orchestration

This is probably the biggest shift.

Hawkeye now has a real Swarm mode for coordinating multiple agents in parallel.

Instead of treating each run as an isolated session, you can now work with multiple agents at once and monitor them from one place.

What that unlocks:

multiple agents working in parallel
isolated responsibilities
live output monitoring
live drift and cost visibility
conflict-aware orchestration
a better overview of what the whole room is doing

The interesting part is not just “spawn more agents”.
It is being able to see what each one is doing and keep the system legible while they are running.

That became a recurring theme while building Hawkeye:
the moment you have more than one agent, visibility matters more than raw generation quality.

2. Remote tasks + daemon

I also added a stronger task workflow.

You can now queue tasks, let them run in the background or overnight, and inspect the results from the dashboard.

That includes:

a task daemon
retry flows
cancel support
better failure reporting
more useful output handling
cleaner separation between live execution and finished results

This matters because a lot of real agent usage is not interactive.
Sometimes you want to fire off a job, come back later, and understand what happened without tailing a terminal the whole time.

3. Better local model support

Hawkeye now supports Ollama and LM Studio more cleanly across the product.

That includes:

local runtime configuration
local model selection in the relevant CLI flows
better handling in task/agent workflows
cleaner integration with the dashboard settings

This was important to me because I did not want Hawkeye to be tied to one cloud provider or one commercial runtime.

The goal is:
if you want to run a local model, Hawkeye should still give you observability, drift detection, and control.

4. The dashboard is much more capable now

A lot of work in the past weeks went into turning the dashboard into something you can actually operate from, not just inspect after the fact.

The biggest improvements were around:

Compare

proper visual comparison between runs, links to sessions, export, top cost files, cleaner highlighting
Firewall

better initial loading, clearer live feed behavior, review feedback, clear/reset actions, less stale state
Tasks

retry, cancel, daemon status, clearer runtime selection, better handling of provider/runtime errors
Agents

follow-ups while running, relaunch/clone flows, better runtime choices, cleaner launch studio
Settings

cleaner structure, more obvious save feedback, better local provider handling

A lot of this work was not “new feature” work.
It was cleanup.
Removing confusion.
Making the product more coherent when you actually use it every day.

5. GitHub PR reporting

I also pushed the GitHub reporting side further.

hawkeye ci can now post a structured report back to a PR, including things like:

drift
cost
files touched
run summary

That closes an important loop for me:
not just what happened while the agent ran, but also how that run gets communicated back into a normal engineering workflow.

6. Cleaner runtime story

Another thing I spent time on was making the runtime choices more honest.

Some agent runtimes are great for broad repo understanding.
Some are better for focused patching.
Some local models work fine for one task and fall apart for another.

A lot of recent work went into making Hawkeye expose those tradeoffs more cleanly instead of pretending all runtimes behave the same.

That also meant removing or de-emphasizing flows that were creating more confusion than value.

What did not change

The core philosophy is still the same:

local-first
SQLite
no Hawkeye cloud dependency
useful even if the underlying agent changes

Right now Hawkeye is most aligned with:

Claude Code
Codex
Cline
custom agent CLIs

That feels like a better, cleaner foundation than trying to support everything equally at once.

What I learned while building this

A few things became clearer over time.

Observability matters more than autonomy

People talk a lot about how capable agents are becoming.

But once an agent starts touching a real codebase, debuggability becomes just as important as capability.

Raw autonomy without visibility stops being impressive very quickly.

Local-first is still worth it

Keeping Hawkeye local-first forced some tradeoffs, but I still think it is the right call.

A lot of people experimenting seriously with agent workflows do not want another hosted black box sitting on top of their existing black boxes.

UX matters a lot in agent tooling

A huge amount of this update was not about adding “AI”.
It was about removing friction:

too much noise
stale state
misleading labels
unclear provider behavior
heavy layouts
confusing runtime defaults

That work is less flashy, but it matters more than it looks.

What still needs work

There is still a lot I want to improve.

A few areas I am actively thinking about:

even better runtime selection defaults
stronger live output UX across all task types
more polished reporting/export flows
further CLI cleanup
more consistency between task, agent, and session workflows

So this is definitely not “done”.
But it feels much more like a real product now than it did in the first post.

Try it

GitHub:

github.com/MLaminekane/hawkeye
Npm:
(https://www.npmjs.com/package/hawkeye-ai)

Install:


bash
npm install -g hawkeye-ai

Hawkeye - open source flight recorder & guardrails for AI agents before things go wrong

mlaminekane — Fri, 13 Mar 2026 21:04:46 +0000

AI coding agents are incredibly powerful — but they're also black boxes. You give Claude Code, Cursor, or Aider a task, and 5 minutes later you find it's been editing CSS when you asked for auth, burned $3 in tokens, or worse, touched your .env file.

I built Hawkeye to fix this.

What is Hawkeye ?

An open-source observability & security layer for AI agents. Think of it as a flight recorder - it captures everything the agent does, scores its behavior in real-time, and can auto-pause it before things go wrong.

How DriftDetect works ?

Every action the agent takes gets a drift score from 0 to 100. The score starts at 100 and drops based on:

Dangerous commands (-40 pts each)

rm -rf /, sudo rm, curl | bash, DROP TABLE...

Sensitive file access (-15 to -25 pts)

Files outside the project directory
System paths: /etc/, ~/.ssh/, ~/.aws/
Credentials: .env, .pem, .key

Suspicious behavior (-10 to -15 pts)

5+ errors in the last 10 actions (infinite loop?)
15 actions with zero file changes (token burn)
High LLM cost with nothing to show for it
Too many unrelated file types modified
Dependency explosion (5+ package.json changes)

When the score drops below 40, Hawkeye auto-pauses the session. The agent is frozen until you review and resume.

Optionally, a local LLM (Ollama) can also evaluate whether the actions match the original objective — so it catches semantic drift too, not just dangerous patterns.

Guardrails

Rules evaluated before every action. If it violates a rule, the action is blocked before it executes:

  {
    "guardrails": [
      {
        "name": "Protect secrets",
        "type": "file_protect",
        "action": "block",
        "config": { "paths": ["**/.env", "**/*.key", "**/*.pem"] }
      },
      {
        "name": "Budget limit",
        "type": "cost_limit",
        "action": "warn",
        "config": { "maxUsdPerSession": 5.0 }
      }
    ]
  }

7 rule types: file protection, command blocking, cost limits, token limits, directory scoping, network restrictions, and human approval gates. The agent can self-monitor Hawkeye exposes an MCP server with 27 tools.

The agent can:

Call check_drift : to see its own score and course-correct
Call check_guardrail : before a risky action to avoid getting blocked
Call suggest_correction : when drift is high to get back on track
Call log_event : to document decisions

The agent also builds persistent memory — after each task, a journal entry (prompt, files changed, outcome) is saved and injected into future tasks. So it learns from past sessions.

Dashboard

A web UI with session replay, drift charts, event timeline, and remote task submission from your phone. Mobile responsive with a Cloudflare tunnel option for remote access.

Quick start:

  npm install -g hawkeye-ai

  - For TUI
  hawkeye

  - For Claude Code
  hawkeye hooks install

  - For any other agent
  hawkeye record -o "Build a REST API" -- aider

  - Launch dashboard
  hawkeye serve

  - Remote and use hawkeye on mobile
  hawkeye remote

Stack

TypeScript monorepo. SQLite for storage. Everything runs locally — no cloud, no telemetry, no data leaves your machine. MIT licensed.

GitHub:

MLaminekane / hawkeye

The flight recorder for AI agents - observability and security for Claude Code, Aider, AutoGPT and more

Hawkeye

The flight recorder for AI agents
Open-source observability & security for Claude Code · Aider · AutoGPT · CrewAI · Open Interpreter · any LLM-powered agent

Install • Quick Start • Features • CLI • Dashboard • DriftDetect • Guardrails • Security • Architecture

What is Hawkeye?

Hawkeye is a flight recorder for AI agents. It captures every action an agent performs — terminal commands, file operations, LLM calls, API requests — and provides:

Session recording & replay — Full timeline of every agent action with costs and metadata
Time Travel Debugging — Step-through replay with breakpoints, keyboard shortcuts, interactive SVG timeline, session forking ("replay from here")
Root Cause Analysis — Automatic hawkeye analyze finds primary errors, causal chains, error patterns, and fix suggestions (heuristic + optional LLM)
DriftDetect — Real-time objective drift detection using heuristic + LLM scoring
Guardrails — File protection, command blocking, cost limits, token limits…

View on GitHub

Npm:

npmjs.com/package/hawkeye-ai

I'd love feedback. One challenge I'm still working on: token/cost tracking is unreliable when agents don't expose usage data in their hooks. If anyone has ideas on this, I'm all ears.