You deploy a service, it crashes at 3 AM. You get a Sentry alert, SSH in, dig through logs, try to figure out what happened. Or you pay $26/month for Sentry plus integrate the SDK into every project.
I wanted crash detection and recovery built directly into the process manager — no extra services, no SDK, no dashboards. So I built Velos.
When a process crashes or throws an unhandled error, Velos detects it, runs AI analysis on the logs and stack trace, and sends a Telegram message with two buttons: Fix and Ignore. Tap Fix — an agent reads the logs, inspects the code, diffs recent changes, proposes a fix, and reports back. All without waking up your laptop.
How the crash flow works
Process crashes (or stderr matches panic/traceback/FATAL pattern)
→ Velos daemon detects it
→ AI analysis runs (Anthropic or any OpenAI-compatible API)
→ Telegram notification with [Fix] [Ignore] buttons
→ You tap Fix
→ Agentic loop starts: reads files, inspects git diff, runs commands
→ Result sent back to Telegram
Nothing happens without your approval. The agent has tools to read/edit files, run commands, and inspect git history — but it only acts when you tap Fix.
This also works for runtime errors before a crash. Velos watches stderr for panic/traceback/FATAL/ERROR patterns and fires an alert without waiting for the process to die. Closer to what Sentry does — but zero SDK, zero extra service.
Why Zig for the daemon?
Process management is fundamentally a systems problem: fork/exec, Unix sockets, signal handling, CPU/RAM monitoring via syscalls. Zig gives full control with zero runtime overhead — no hidden allocations, no GC, comptime metaprogramming for tight control over the binary.
The result: ~3 MB RAM idle, ~65 KB per managed process. PM2 sits at ~60 MB baseline because of V8.
Architecture:
Zig daemon (fork/exec, IPC, monitoring)
↓ C ABI FFI
Rust shell (CLI, REST API, MCP server, log engine, AI)
Zig core compiles to a static library (libvelos_core.a), linked into the Rust binary via C ABI FFI. Single binary, zero runtime dependencies.
IPC uses a binary protocol — 7-byte header (magic 0xVE10) + MessagePack payload over a Unix socket.
Smart Log Engine
Pattern clustering, error rate tracking, anomaly detection — runs fully local, zero LLM cost.
velos logs myapp --summary
# → "47 errors in last 10m, 3 patterns detected:
# [1] DB connection timeout (×31)
# [2] Rate limit exceeded (×12)
# [3] File not found (×4)"
Instead of scrolling through thousands of raw log lines, you get a digest.
MCP server (for local AI workflows)
Velos also ships with a native MCP server — 13 tools for Claude Desktop or Claude Code to manage your local processes directly:
{
"mcpServers": {
"velos": {
"command": "velos",
"args": ["mcp", "start"]
}
}
}
Tools include: start_process, stop_process, restart_process, get_logs, health_check, get_metrics, search_logs, analyze_crash, and more. Useful for local dev environments where your AI agent is already running on the same machine.
Install
curl -fsSL https://releases.velospm.dev/install.sh | bash
Or Homebrew:
brew install Dave93/tap/velos
Current state
Velos is at v0.1.14, production-ready on macOS and Linux. Windows support is on the roadmap.
What's coming next:
- XDG Base Directory compliance (tracking: #10)
- Plugin system
- Web dashboard
- Benchmark suite vs PM2/supervisord/systemd
GitHub: Dave93/velos
Docs: velospm.dev
Happy to answer questions about the crash detection system, the Zig/Rust architecture, or the design decisions. What would you have done differently?
Top comments (0)