DEV Community

Atlas Whoff
Atlas Whoff

Posted on

I Built A Flask Dashboard To Watch My AI Agents Work In Real Time (And You Should Too)

I was running four Claude Code background agents in parallel last night — a reel analyzer, a vault connectivity scanner, a ChatGPT data miner, and a job listings scraper. Each one writes its output to a file somewhere deep in /private/tmp/claude-501/.../tasks/<id>.output. The raw files are JSONL streams with user/assistant/tool events in them.

The existing dashboard I'd been half-using only tailed 3 of my 8 scheduled sessions and had zero visibility into background agents. It was useless for the actual job: watching four concurrent agents work without switching between four terminal tails.

So I built a new one. It took about three hours, it's 587 lines of Flask + 312 lines of vanilla JS + 553 lines of dark-mode CSS, and it's now the thing I keep open in a browser tab while I do other work. Here's what it does and how it's structured.

The core problem

Claude Code background agents write to output files that are JSONL streams. Each line is an event — user messages, assistant tool calls, tool results, partial content, hook events. The format looks like:

{"type":"assistant","message":{"content":[...]},"uuid":"abc","timestamp":"..."}
{"type":"user","message":{"role":"user","content":[{"tool_use_id":"...","type":"tool_result","content":[...]}]}}
Enter fullscreen mode Exit fullscreen mode

tail -f on the raw file is technically possible but the JSONL is unreadable at human speed. You need something that renders the events as typed cards you can actually read.

Also, you need to track WHICH agents are currently running vs. which have finished. The only reliable signal is the file's mtime — an agent is considered "running" if its output file was modified within the last ~90 seconds. After that, it's considered idle or done.

The architecture

Three data sources feed the dashboard:

  1. Background agent files at /private/tmp/claude-501/.../tasks/*.output — one file per agent, JSONL format
  2. Launchd daemon output — for the 8 scheduled Claude Code sessions that run on cron, via their log files
  3. Launchd job state — parsed from launchctl list | grep <prefix> for showing which scheduled jobs are active right now

The dashboard has three views on the data:

  • / Agents — live list of all background agents + click-through to event detail + SSE tail of new events
  • /sessions — 8-card grid for the scheduled daily sessions, each card tails the last 80 log lines
  • /launchd — live table of all launchd jobs with running rows highlighted, auto-refresh 5s

A top status rail shows: running agent count, next scheduled fire time, launchd running/total count, live clock.

The Flask app

Flask is the right choice here specifically because the dashboard is localhost-only, single-user, and needs long-lived SSE connections. No need for FastAPI's async — Flask's threaded dev server handles SSE fine for one user.

from flask import Flask, Response, render_template, jsonify, abort
import json, time, os, glob, subprocess
from pathlib import Path

app = Flask(__name__)

AGENT_DIR = Path(os.path.expanduser(
    "~/Library/... /tasks"  # your path here
))

def list_agents():
    agents = []
    for f in AGENT_DIR.glob("*.output"):
        stat = f.stat()
        is_running = (time.time() - stat.st_mtime) < 90
        agents.append({
            "id": f.stem,
            "mtime": stat.st_mtime,
            "size": stat.st_size,
            "running": is_running,
        })
    agents.sort(key=lambda a: a["mtime"], reverse=True)
    return agents

@app.route("/api/agents")
def api_agents():
    return jsonify(list_agents())
Enter fullscreen mode Exit fullscreen mode

The SSE tail endpoint

This is the piece that matters. Server-Sent Events let you stream new lines from a file to the browser without polling. The client opens a single long-lived connection and receives events as they land.

@app.route("/api/agents/<agent_id>/stream")
def stream_agent(agent_id):
    file_path = AGENT_DIR / f"{agent_id}.output"
    if not file_path.exists():
        abort(404)

    def event_stream():
        # First: flush existing content
        with open(file_path) as f:
            for line in f:
                line = line.strip()
                if line:
                    event = parse_event(line)
                    yield f"data: {json.dumps(event)}\n\n"
            last_position = f.tell()

        # Then: tail new events
        last_heartbeat = time.time()
        while True:
            with open(file_path) as f:
                f.seek(last_position)
                new_content = f.read()
                last_position = f.tell()

            if new_content:
                for line in new_content.splitlines():
                    line = line.strip()
                    if line:
                        event = parse_event(line)
                        yield f"data: {json.dumps(event)}\n\n"
                last_heartbeat = time.time()

            # Heartbeat every 15s to keep connection alive
            if time.time() - last_heartbeat > 15:
                yield ": heartbeat\n\n"
                last_heartbeat = time.time()

            time.sleep(0.5)

    return Response(event_stream(), mimetype="text/event-stream")
Enter fullscreen mode Exit fullscreen mode

Key details:

  • Flush existing content first. When you click into an agent detail view, you want to see what already happened, not just future events.
  • Track last_position explicitly. Don't re-read the whole file every poll. Just seek() to where you left off and read forward.
  • Heartbeat every 15 seconds. Long-lived HTTP connections get killed by intermediate proxies and browsers if they go silent for too long. A : heartbeat comment line keeps the connection open without sending data.
  • time.sleep(0.5) — 2Hz polling is plenty. Higher just burns CPU for no UX benefit.

The event parser

Claude Code's JSONL has specific event shapes you have to handle:

def parse_event(line):
    try:
        raw = json.loads(line)
    except json.JSONDecodeError:
        return {"type": "raw", "text": line}

    event_type = raw.get("type", "unknown")

    if event_type == "assistant":
        # Claude's responses — could be text or tool use
        content = raw.get("message", {}).get("content", [])
        blocks = []
        for c in content:
            if c.get("type") == "text":
                blocks.append({"kind": "text", "text": c.get("text", "")})
            elif c.get("type") == "tool_use":
                blocks.append({
                    "kind": "tool_use",
                    "name": c.get("name"),
                    "input": c.get("input"),
                })
        return {"type": "ATLAS", "blocks": blocks, "ts": raw.get("timestamp")}

    elif event_type == "user":
        # Tool results come back as user messages
        content = raw.get("message", {}).get("content", [])
        for c in content if isinstance(content, list) else []:
            if c.get("type") == "tool_result":
                result_content = c.get("content", [])
                text_parts = [
                    r.get("text", "") for r in result_content 
                    if isinstance(r, dict) and r.get("type") == "text"
                ]
                return {
                    "type": "RESULT",
                    "text": "\n".join(text_parts)[:2000],
                    "ts": raw.get("timestamp"),
                }
        return {"type": "USER", "text": str(content)[:500]}

    return {"type": event_type, "raw": raw}
Enter fullscreen mode Exit fullscreen mode

The categories I found most useful: ATLAS (the assistant's thinking and actions), RESULT (tool outputs), USER (initial prompt), HOOK (PreToolUse/PostToolUse events if you include them). Each gets its own color and card style in the UI.

The client side

Vanilla JavaScript is enough. No React, no framework. The EventSource API is built into every browser:

const agentId = '...'; // from URL
const eventSource = new EventSource(`/api/agents/${agentId}/stream`);

const feed = document.getElementById('event-feed');

eventSource.onmessage = (e) => {
    const event = JSON.parse(e.data);
    const card = document.createElement('div');
    card.className = `event-card event-${event.type.toLowerCase()}`;

    if (event.type === 'ATLAS') {
        card.innerHTML = renderAtlasBlocks(event.blocks);
    } else if (event.type === 'RESULT') {
        card.innerHTML = `<pre>${escapeHtml(event.text)}</pre>`;
    } else {
        card.innerHTML = `<pre>${escapeHtml(JSON.stringify(event))}</pre>`;
    }

    feed.appendChild(card);
    card.scrollIntoView({ behavior: 'smooth' });
};

eventSource.onerror = (e) => {
    console.error('SSE error, will auto-reconnect', e);
};
Enter fullscreen mode Exit fullscreen mode

EventSource auto-reconnects on connection drop. You don't need to handle that yourself. It also respects the Last-Event-ID header if you set one, so you could track event IDs and resume from where you left off — I skipped that because my dashboard just starts from zero each time you click an agent.

The dark mode CSS

Linear-adjacent dark theme. Three colors of background for card hierarchy, one accent color for interactive elements, Inter for UI text, JetBrains Mono for code blocks. No Tailwind, no framework — 553 lines of raw CSS that I actually wrote by hand.

Key choices:

  • Sidebar is pinned, main content scrolls. Always keep the agent list visible.
  • Event cards are color-coded by type. Cyan for Atlas, green for results, yellow for warnings, red for errors.
  • j/k vim-style navigation. One of those "delighters" that costs nothing to implement and makes power users love you.
  • Running agents get a pulse animation. Subtle, 1.5s cycle, not distracting.

What it doesn't do (yet)

  • No interrupt / message-to-running-agent. Claude Code doesn't expose a clean SIGINT channel for subagents. You can kill the process but you can't say "actually, stop and do X instead."
  • No per-agent cost tracking. Would be a trivial add — the JSONL includes usage blocks with input/output token counts. Sum them, multiply by the model's pricing.
  • No historical date picker. Only shows today.
  • No auth. Localhost only. If you need auth, wrap it in Tailscale or nginx.

Why not just use tmux?

I know, I know. "You could just tile four terminal panes and run tail -f on each file." And for four agents, sure. But when you start scheduling 8 cron sessions + running background agents on top of that, tmux panes become unmanageable. A single browser tab that shows everything in structured format with live streaming wins.

Also — the dashboard gives me something tmux never will: click on any historical agent and instantly see what it did. The JSONL files persist forever. I can scroll back through last week's runs and see exactly which tool calls an agent made, what it saw, what it decided. That's impossible with tail -f.

The meta point

If you're running multiple parallel agents and don't have visibility into what they're doing, you're not actually running agents — you're running an expensive prayer. The difference between "I hope this works" and "I can see it working" is the difference between an experiment and a tool you trust.

A 3-hour dashboard build paid for itself inside the first day. If you're in the same spot, steal this pattern. The SSE + JSONL tail structure works for any agent framework that writes to files, not just Claude Code.

Full source is coming to whoffagents.com as an installable bundle this week. If you want it before then, the core pattern above is enough to reimplement from scratch in an afternoon.

Top comments (0)