DEV Community

Cover image for How I Built a <50MB RAM Daemon in Go That Tracks 7 AI APIs - And Then Gave It a macOS Menu Bar
Prakersh Maheshwari
Prakersh Maheshwari

Posted on

How I Built a <50MB RAM Daemon in Go That Tracks 7 AI APIs - And Then Gave It a macOS Menu Bar

Last month I shipped a macOS menu bar companion for onWatch — my open-source CLI that tracks AI API quotas across 7 providers. Building it forced me to solve problems I had never touched: spawning a child process from a daemon, bridging Go and Objective-C with CGo, rendering a native popover backed by WKWebView, and coordinating state between two OS processes via SIGUSR1.

This post walks through the architecture decisions, the trade-offs, and the code. Every snippet is from the actual codebase.

What onWatch Does

I use Claude Code, Codex CLI, GitHub Copilot, and several others daily. Each provider has its own dashboard with different billing cycles, different quota types, and different definitions of "usage." I got tired of opening 7 browser tabs every morning, so I built a single CLI that polls them all from one terminal.

onWatch runs as a background daemon. It polls 7 providers in parallel, stores history in SQLite, and serves a Material Design 3 web dashboard. Single binary, <50MB RAM, zero telemetry.

The 7 providers: Anthropic (Claude Code), OpenAI Codex, GitHub Copilot, MiniMax, Antigravity, Synthetic, and Z.ai.

GitHub: github.com/onllm-dev/onwatch

The Goroutine-per-Provider Pattern

Each provider runs as an independent Agent in its own goroutine. If Anthropic's API is slow, Codex keeps polling. If Z.ai returns an error, Copilot is unaffected.

From internal/agent/agent.go:

type Agent struct {
    client       *api.Client
    store        *store.Store
    tracker      *tracker.Tracker
    interval     time.Duration
    logger       *slog.Logger
    sm           *SessionManager
    notifier     *notify.NotificationEngine
    pollingCheck func() bool
}

func (a *Agent) Run(ctx context.Context) error {
    a.logger.Info("Agent started", "interval", a.interval)
    defer func() {
        if a.sm != nil {
            a.sm.Close()
        }
        a.logger.Info("Agent stopped")
    }()

    a.poll(ctx)

    ticker := time.NewTicker(a.interval)
    defer ticker.Stop()

    for {
        select {
        case <-ticker.C:
            a.poll(ctx)
        case <-ctx.Done():
            return nil
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Clean shutdown via context.Context cancellation. The pollingCheck callback lets the dashboard toggle individual providers on/off at runtime without restarting the daemon.

SQLite as the Only Dependency

No Postgres, no Redis, no message queue. One SQLite file with WAL mode. My running instance is 55MB after weeks of continuous polling across all 7 providers.

The key insight: per-cycle historical usage patterns are more valuable than point-in-time snapshots. Every provider shows you current usage. None of them show you which sessions burn quota fastest, which days you're heaviest, or when exactly your cycle resets.

From internal/store/store.go:

db.SetMaxOpenConns(2)
db.SetMaxIdleConns(1)

pragmas := []string{
    "PRAGMA journal_mode=WAL;",
    "PRAGMA synchronous=NORMAL;",
    "PRAGMA cache_size=-500;",  // 512KB page cache
    "PRAGMA foreign_keys=ON;",
    "PRAGMA busy_timeout=5000;",
}
Enter fullscreen mode Exit fullscreen mode

The comment in the source explains why: "SQLite is single-writer anyway, and each connection allocates its own page cache (~2 MB with default settings). Limiting to 1 connection saves 2-4 MB RSS."

The SQLite driver is modernc.org/sqlite — a pure Go transpilation of SQLite. No CGO required for the main binary. This is what makes cross-compilation trivial and the Docker image tiny.

Memory Tuning: How <50MB Actually Works

My running instance uses ~34MB RSS with all 7 providers polling every 60 seconds. Here is exactly how:

From main.go:

// Memory tuning: GOMEMLIMIT triggers MADV_DONTNEED which actually shrinks RSS.
// Without this, Go uses MADV_FREE on macOS - pages are reclaimable but still
// counted in RSS, causing a permanent ratchet effect.
debug.SetMemoryLimit(40 * 1024 * 1024) // 40 MiB soft limit
debug.SetGCPercent(50)                 // GC at 50% heap growth
Enter fullscreen mode Exit fullscreen mode

Without GOMEMLIMIT, Go on macOS uses MADV_FREE — the kernel marks pages as reclaimable but they still show up in RSS. Your process looks like it is leaking memory when it is not. Setting a memory limit switches Go to MADV_DONTNEED, which actually returns pages to the OS. This one line is the difference between "34MB" and "looks like 80MB."

Other contributing decisions:

  • Bounded queries everywhere: LIMIT 200 for cycles, LIMIT 50 for insights
  • All static assets embedded via embed.FS — zero runtime allocations for serving HTML/JS/CSS
  • Every API client caps response reads: io.ReadAll(io.LimitReader(resp.Body, 1<<16)) — 64KB max per response, preventing memory exhaustion from malformed API payloads
  • No ORM. Parameterized SQL only.

The Single Binary Trick

The entire application — HTML templates, JavaScript, CSS, service worker manifest, favicon, menubar frontend — is embedded using Go's embed.FS:

//go:embed templates/*.html
var templatesFS embed.FS

//go:embed all:static/*
var staticFS embed.FS
Enter fullscreen mode Exit fullscreen mode

Even the version string is embedded:

//go:embed VERSION
var embeddedVersion string
Enter fullscreen mode Exit fullscreen mode

No npm build step, no external files, no runtime file access. You download one ~15MB file and run it. The Docker distroless image is ~10MB because -ldflags="-s -w" strips debug symbols.

The 7 Provider Clients

Each provider lives in internal/api/ with its own client file. Some of the implementation details are genuinely interesting:

Anthropic auto-detects tokens from Claude Code's credential files (~/.claude/.credentials.json or macOS Keychain). The agent calls SetTokenRefresh() before each poll to proactively refresh OAuth tokens before they expire. No manual configuration needed if Claude Code is installed.

Codex has dual-endpoint fallback. buildCodexFallbackBaseURL() tries /backend-api/wham/usage first, falls back to /api/codex/usage on 404, because OpenAI serves different URL shapes in different environments. It also supports multi-account polling — the CodexAgentManager runs per-account agents with per-account visibility toggles.

Antigravity has no static endpoint at all. The client calls detectProcess() to find the Antigravity language server PID via os/exec, then discoverPorts() on that PID, then probes each port to find the Connect RPC endpoint. Zero configuration. It also sets InsecureSkipVerify: true because the local server uses self-signed certificates — acceptable since it only ever talks to localhost.

Z.ai returns HTTP 200 with an error code in the body. The client parses wrapper.Code == 401 from the JSON response to detect auth failures rather than relying on HTTP status codes. This was fun to debug.

MiniMax has shared-quota detection: IsSharedQuota() checks if all active models (M2, M2.1, M2.5) report the same total/remain values. When true, the dashboard renders a single merged card instead of per-model cards.

GitHub Copilot tracks Premium Requests, Chat, and Completions quotas with monthly reset cycles. Auth is via GitHub PAT with the copilot scope.

Synthetic tracks Subscription, Search, and Tool Call quotas via its /v2/quotas endpoint.

The macOS Menu Bar: A Separate Process, Not a Goroutine

This was the most architecturally interesting decision. The menu bar is not a goroutine in the daemon process — it is a separate OS process.

Why? systray.Run() from fyne.io/systray must block the OS main thread on macOS (Cocoa requirement). If we ran it in the daemon process, Cocoa would take ownership of the main goroutine, and the HTTP server would need to run in a background thread. That is the wrong inversion.

Instead, the daemon spawns itself with a different subcommand. The daemon calls startMenubarCompanion(), which runs exec.Command(exe, "menubar", "--port=...", "--db=...") — the same binary, re-invoked as a child process. It writes a PID file to ~/.onwatch/onwatch-menubar.pid and redirects logs to a rotating log file.

The SIGUSR1 Refresh Protocol

The two processes need to stay in sync. When the daemon polls a provider and gets new data, the menu bar should update immediately rather than waiting for its next tick.

The solution is SIGUSR1:

const refreshCompanionSignal = syscall.SIGUSR1
Enter fullscreen mode Exit fullscreen mode

After each successful quota poll, the daemon sends SIGUSR1 to the menu bar PID. The companion's signal goroutine calls controller.refreshStatus() immediately:

func runCompanion(cfg *Config) error {
    controller := &trayController{cfg: cfg}
    signalChan := make(chan os.Signal, 1)
    signal.Notify(signalChan, refreshCompanionSignal)
    defer signal.Stop(signalChan)
    go func() {
        for range signalChan {
            controller.refreshStatus()
        }
    }()
    systray.Run(controller.onReady, controller.onExit)
    return nil
}
Enter fullscreen mode Exit fullscreen mode

The menu bar also runs its own refresh loop as a fallback — if SIGUSR1 is missed (process paused, signal lost), the next tick picks it up.

The Native Popover: Go + Objective-C via CGo

Clicking the tray icon does not open a browser tab. It opens a native NSPanel backed by WKWebView.

From internal/menubar/popover_darwin.m:

The popover is a 360x680px borderless panel. The OnWatchBorderlessPanel subclass overrides three methods:

  • canBecomeKeyWindow returns YES (keyboard accessible)
  • canBecomeMainWindow returns NO (never steals focus from your editor)
  • acceptsFirstMouse returns YES (responds to first click without requiring activation)

The Go layer in webview_darwin.go wraps these Objective-C calls with unsafe.Pointer handles. When CGo is unavailable (non-macOS builds), it falls back to opening a browser tab at http://localhost:9211/menubar.

Build tag isolation keeps this clean:

//go:build menubar && darwin
Enter fullscreen mode Exit fullscreen mode

menubar_stub.go provides no-ops for all other platforms. The standard binary compiles on Linux and Windows without any macOS dependencies.

Three Display Modes for the Tray Title

The tray icon shows live data in three configurable modes.

From internal/menubar/tray_display.go:

func TrayTitle(snapshot *Snapshot, settings *Settings) string {
    switch normalized.StatusDisplay.Mode {
    case StatusDisplayIconOnly:
        return ""
    case StatusDisplayCriticalCount:
        count := snapshot.Aggregate.WarningCount + snapshot.Aggregate.CriticalCount
        return fmt.Sprintf("%d ⚠", count)
    case StatusDisplayMultiProvider:
        parts := multiProviderMetrics(snapshot, normalized.StatusDisplay)
        return joinTrayParts(parts)
    }
}
Enter fullscreen mode Exit fullscreen mode
  • multi_provider — Live percentages per selected quota, separated by . Example: 67% │ 23% │ 91%
  • critical_count — Count of warning + critical quotas. Example: 2 ⚠
  • icon_only — No text, just the icon

Users configure which quotas appear in the tray via Settings > Menubar. The normalizeStatusSelections() function deduplicates entries, caps at 3, and strips empty values.

Three View Presets for the Popover

The popover frontend (embedded via //go:embed) renders three view presets:

  • minimal — A single circular percentage ring with aggregate status across all providers
  • standard — Individual provider cards with circular quota meters (SVG stroke-dasharray animation)
  • detailed — Expanded cards with sparkline trend lines (SVG polyline charts showing usage over time)

The same JavaScript works in both the native WKWebView and a regular browser — a window.__ONWATCH_MENUBAR_BRIDGE__ object is injected server-side to handle the differences.

Notifications: Push and Email

onWatch sends desktop push notifications (Web Push/VAPID) and email alerts (SMTP) when quotas cross configurable thresholds.

Three notification types: Warning (default 80%), Critical (default 95%), and Reset (quota renewed). Each has a 30-minute cooldown to prevent alert fatigue. Per-quota overrides let you set different thresholds — or switch to absolute values instead of percentages.

The VAPID key generation uses ecdsa.GenerateKey(elliptic.P256(), rand.Reader) — standard Web Push. SMTP passwords are encrypted at rest with AES-256-GCM using HKDF-derived keys.

Security

Some decisions I want to highlight because they are easy to get wrong:

Constant-time credential comparison: subtle.ConstantTimeCompare for both password and session token checks. Timing attacks against local services are unlikely, but the fix is one import.

Password hashing: bcrypt at DefaultCost (10) for new passwords. Legacy SHA-256 hashes are migrated transparently on next login.

SMTP password encryption at rest: AES-256-GCM with HKDF-derived keys. When the admin password changes, ReEncryptAllData() re-encrypts all stored SMTP credentials with the new key. The info string "onwatch-smtp-encryption" provides domain separation.

Login rate limiting: 5 max failures per 5-minute window per IP before blocking.

Bounded response reading: Every API client caps responses at 64KB via io.LimitReader. A malicious or broken API cannot OOM the daemon.

The Numbers

These are from the actual codebase (verified March 2026):

Metric Value
Go source lines ~109,000
Test files 98
Test functions 2,473
Binary size ~15MB (unstripped)
Runtime memory ~34MB RSS (all 7 providers polling)
Provider clients 7
Docker image ~10MB (distroless, non-root)
SQLite driver Pure Go (no CGO for main binary)

CI runs go test -race -coverprofile=coverage.out -covermode=atomic -count=1 ./... on every push. The -race flag catches data races between the 7 concurrent polling goroutines. The menubar CI job on macOS runs an additional Playwright E2E suite.

Try It

One-line install (macOS/Linux):

curl -fsSL https://raw.githubusercontent.com/onllm-dev/onwatch/main/install.sh | bash
Enter fullscreen mode Exit fullscreen mode

Homebrew:

brew install onllm-dev/tap/onwatch
Enter fullscreen mode Exit fullscreen mode

Docker:

cp .env.docker.example .env   # add your API keys
docker-compose up -d
Enter fullscreen mode Exit fullscreen mode

Open-source, GPL-3.0: github.com/onllm-dev/onwatch

Happy to answer questions about any of the architecture decisions. What would you do differently?

Top comments (0)