Davron Yuldashev

Posted on Mar 14

I built a process manager with built-in crash detection and AI auto-fix (PM2 alternative)

#rust #zig #devops #opensource

You deploy a service, it crashes at 3 AM. You get a Sentry alert, SSH in, dig through logs, try to figure out what happened. Or you pay $26/month for Sentry plus integrate the SDK into every project.

I wanted crash detection and recovery built directly into the process manager — no extra services, no SDK, no dashboards. So I built Velos.

When a process crashes or throws an unhandled error, Velos detects it, runs AI analysis on the logs and stack trace, and sends a Telegram message with two buttons: Fix and Ignore. Tap Fix — an agent reads the logs, inspects the code, diffs recent changes, proposes a fix, and reports back. All without waking up your laptop.

How the crash flow works

Process crashes (or stderr matches panic/traceback/FATAL pattern)
  → Velos daemon detects it
  → AI analysis runs (Anthropic or any OpenAI-compatible API)
  → Telegram notification with [Fix] [Ignore] buttons
  → You tap Fix
  → Agentic loop starts: reads files, inspects git diff, runs commands
  → Result sent back to Telegram

Nothing happens without your approval. The agent has tools to read/edit files, run commands, and inspect git history — but it only acts when you tap Fix.

This also works for runtime errors before a crash. Velos watches stderr for panic/traceback/FATAL/ERROR patterns and fires an alert without waiting for the process to die. Closer to what Sentry does — but zero SDK, zero extra service.

Why Zig for the daemon?

Process management is fundamentally a systems problem: fork/exec, Unix sockets, signal handling, CPU/RAM monitoring via syscalls. Zig gives full control with zero runtime overhead — no hidden allocations, no GC, comptime metaprogramming for tight control over the binary.

The result: ~3 MB RAM idle, ~65 KB per managed process. PM2 sits at ~60 MB baseline because of V8.

Architecture:

Zig daemon (fork/exec, IPC, monitoring)
    ↓  C ABI FFI
Rust shell (CLI, REST API, MCP server, log engine, AI)

Zig core compiles to a static library (libvelos_core.a), linked into the Rust binary via C ABI FFI. Single binary, zero runtime dependencies.

IPC uses a binary protocol — 7-byte header (magic 0xVE10) + MessagePack payload over a Unix socket.

Smart Log Engine

Pattern clustering, error rate tracking, anomaly detection — runs fully local, zero LLM cost.

velos logs myapp --summary
# → "47 errors in last 10m, 3 patterns detected:
#    [1] DB connection timeout (×31)
#    [2] Rate limit exceeded (×12)
#    [3] File not found (×4)"

Instead of scrolling through thousands of raw log lines, you get a digest.

MCP server (for local AI workflows)

Velos also ships with a native MCP server — 13 tools for Claude Desktop or Claude Code to manage your local processes directly:

{
  "mcpServers": {
    "velos": {
      "command": "velos",
      "args": ["mcp", "start"]
    }
  }
}

Tools include: start_process, stop_process, restart_process, get_logs, health_check, get_metrics, search_logs, analyze_crash, and more. Useful for local dev environments where your AI agent is already running on the same machine.

Install

curl -fsSL https://releases.velospm.dev/install.sh | bash

Or Homebrew:

brew install Dave93/tap/velos

Current state

Velos is at v0.1.14, production-ready on macOS and Linux. Windows support is on the roadmap.

What's coming next:

XDG Base Directory compliance (tracking: #10)
Plugin system
Web dashboard
Benchmark suite vs PM2/supervisord/systemd

GitHub: Dave93/velos
Docs: velospm.dev

Happy to answer questions about the crash detection system, the Zig/Rust architecture, or the design decisions. What would you have done differently?

DEV Community