oh-my-agent: five new skills land, plus vault and worktree isolation

#ai #programming #productivity #opensource

326 commits over the last two weeks. The headline: five new skills shipped, secret management moved off shell rc files, and the project landed at #1 on its own 5-axis harness benchmark with 80.6/100.

What's new

Five new skills: oma-market (deterministic market research pipeline across keyless sources), oma-docs (doc drift detection + sync), oma-deepsec (drives Vercel's vulnerability scanner end-to-end), oma-academic-writer (publication-grade prose), oma-voice (local TTS/STT via Voicebox MCP).
oma vault: OS keychain credential store backed by @napi-rs/keyring. Stops ANTHROPIC_API_KEY and friends from sitting in shell rc files where any agent subprocess can read them.
agent:spawn --isolation=worktree: opt-in git worktree per spawned agent. Worktrees are retained on exit; the spawner prints merge or discard commands so you decide.
oma stats cost telemetry: per-vendor token breakdown with USD estimates from a conservative pricing floor. Input-only for now, output tokens tracked next.
oma model:check / probe / propose: detect model registry drift against live vendor sources without API keys, then emit a models.yaml patch for accepted candidates.
Cursor as a first-class vendor: cursor/composer-2, composer-2-fast, and auto register with native dispatch. New cursor-only preset (also renamed to plain cursor, see below).
Three slash workflows: /docs for drift verify and sync, /recap for daily and period summaries, /deepsec for the full scan to triage loop.
PWA on the docs site: manifest with maskable icons, install screenshots, and display_override for desktop title-bar customization.

What's fixed

Serena MCP migrated from ephemeral uvx --from git+... to globally installed serena-agent via uv tool install, with vendor-specific --context per the upstream client matrix.
Hook keyword detector hardened against three bypass classes: NFKC normalization for fullwidth Latin from CJK IMEs, hyphen-rejecting word boundaries, and a two-tier CLI invocation guard so prompts that ARE CLI invocations no longer trigger workflows.
oma doctor CLI version probe bounded with a 1500ms spawn race + SIGTERM escalation, so a GUI binary that ignores --version can no longer hang the probe indefinitely.
i18n translation drift detection via oma docs i18n, plus oma docs lint for CJK em-dash anti-patterns (em-dash in CJK prose now flagged as a content-level lint, not auto-fixed).
Preset rename: claude-only to claude, antigravity to mixed. The -only suffix misrepresented the contract (you can already override per-agent), and antigravity collided with the runtime vendor id. Auto-migrated on oma update; hard-error on legacy values in oma-config.yaml.

What's better

Benchmark: oh-my-agent now scores 80.6/100 on the 5-axis multi-judge harness (Functional, Spec, Visual, Engineering, Efficiency), landing #1 against omc, superpowers, vanilla, and ecc. Multi-judge averaging runs three rounds per axis to drop single-run noise.
oma update prints a "What's new" note with added or removed skills and workflows after the version bump, so the catalog change is visible at upgrade time.
docs/generated/ auto-added to .gitignore on doc write sites via a unified cli/io/gitignore.ts module.
Telemetry opt-in unified across vendors: a single telemetry boolean in oma-config.yaml now drives Claude (DISABLE_TELEMETRY), Gemini, Qwen, and Codex opt-out keys on install and update.
cli/cli-kit/ merged into cli/utils/ so there is one home for shared CLI helpers instead of two overlapping ones.
Em-dash sweep across 173 files in skills, workflows, and docs per the oma-translator anti-AI-pattern rule. Em-dash usage restructured contextually with colons, periods, parens, or restructured sentences, not mechanical substitution.

Installation

# macOS / Linux
curl -fsSL https://raw.githubusercontent.com/first-fluke/oh-my-agent/main/cli/install.sh | bash

# Windows (PowerShell)
irm https://raw.githubusercontent.com/first-fluke/oh-my-agent/main/cli/install.ps1 | iex

Links

oh-my-agent is built for teams who orchestrate more than they prompt. Next up: closing the spec gap that surfaced in the benchmark by teaching skills the real-API plus deferred-stub pattern at scaffold time, not at fix time.

https://github.com/first-fluke/oh-my-agent

Top comments (2)

foxck016077 • May 18

The oma vault move — pulling ANTHROPIC_API_KEY and friends out of shell rc files and into the OS keychain via @napi-rs/keyring — is the kind of fix that looks obvious in hindsight but almost nobody does it until they get bitten. Agent subprocesses reading the parent shell's env is the cheapest credential-leak vector and I haven't seen another agent toolkit close it as a default. The fact that it's the opt-in path for the existing tools and the default for the vault is also the right migration shape.

Two follow-on questions from someone building agent-orchestrated workflows on a much smaller surface (single Apify Actor instead of the full oma harness):

On agent:spawn --isolation=worktree: when the spawned agent makes commits inside the worktree, do you carry them as actual git commits in the parent repo's reflog (visible in git log --all), or are they ephemeral until the merge/discard decision? I'm wondering whether the worktree commits create the kind of audit trail you'd want for "what did the agent actually touch" forensics, vs. just being scratch space.
On oma stats cost telemetry — input-only for now and output tokens next: do you find input-side cost is already the dominant signal for spotting runaway loops, or does the input/output ratio only become diagnostic once you have both? For Apify-side runs the failure mode is usually a producer flooding the input context with retries; output-side cost catches a different failure mode (the model rambling).

The 326 commits / 2 weeks pace is the kind of velocity that produces this level of subsystem coverage. The fact that the same harness benchmarks itself and lands at #1 is a separate honest disclosure that I respect.

gracefullight • May 18

Both genuine. Receipts inline.

Worktree audit trail: real commits, never ephemeral. oma agent:spawn --isolation=worktree calls git worktree add -b oma/{sessionId}/{agentId} so the spawn gets a real branch off the current HEAD. Objects land in the parent repo's .git/, the branch is reachable, git log --all from the parent shows everything the agent touched. Cleanup is manual by design (cli/io/worktree.ts): after the agent exits we print the path and a one-liner for merge or git worktree remove ... && git branch -D. The trail survives until you explicitly delete the branch, which is exactly the forensics property you want. Auto-merge was out of scope because it puts the LLM back in the judgment seat oma verify exists to avoid.

Cost signal: input-only catches retry storms, not rambling. The dominant runaway signal we see is spawnCount. checkCap evaluates it before tokens because retry floods spike spawns linearly, while a rambling model emits one long-running spawn with normal-looking input. Input-token cap catches the same retry pattern from the other side. Where input-only goes blind is your second failure mode exactly: 50k output tokens on a normal prompt registers as a cheap spawn. Output tracking is the next add, gated on a per-vendor streaming-usage parser (some vendors don't emit usage events in the same shape).

For your single-Actor surface: spawnCount + input tokens should already cover the producer-flood pattern. Add output tracking once you start seeing single long-running spawns that look cheap in input but never terminate.