DEV Community

gracefullight
gracefullight

Posted on

oh-my-agent: five new skills land, plus vault and worktree isolation

326 commits over the last two weeks. The headline: five new skills shipped, secret management moved off shell rc files, and the project landed at #1 on its own 5-axis harness benchmark with 80.6/100.

What's new

  • Five new skills: oma-market (deterministic market research pipeline across keyless sources), oma-docs (doc drift detection + sync), oma-deepsec (drives Vercel's vulnerability scanner end-to-end), oma-academic-writer (publication-grade prose), oma-voice (local TTS/STT via Voicebox MCP).
  • oma vault: OS keychain credential store backed by @napi-rs/keyring. Stops ANTHROPIC_API_KEY and friends from sitting in shell rc files where any agent subprocess can read them.
  • agent:spawn --isolation=worktree: opt-in git worktree per spawned agent. Worktrees are retained on exit; the spawner prints merge or discard commands so you decide.
  • oma stats cost telemetry: per-vendor token breakdown with USD estimates from a conservative pricing floor. Input-only for now, output tokens tracked next.
  • oma model:check / probe / propose: detect model registry drift against live vendor sources without API keys, then emit a models.yaml patch for accepted candidates.
  • Cursor as a first-class vendor: cursor/composer-2, composer-2-fast, and auto register with native dispatch. New cursor-only preset (also renamed to plain cursor, see below).
  • Three slash workflows: /docs for drift verify and sync, /recap for daily and period summaries, /deepsec for the full scan to triage loop.
  • PWA on the docs site: manifest with maskable icons, install screenshots, and display_override for desktop title-bar customization.

What's fixed

  • Serena MCP migrated from ephemeral uvx --from git+... to globally installed serena-agent via uv tool install, with vendor-specific --context per the upstream client matrix.
  • Hook keyword detector hardened against three bypass classes: NFKC normalization for fullwidth Latin from CJK IMEs, hyphen-rejecting word boundaries, and a two-tier CLI invocation guard so prompts that ARE CLI invocations no longer trigger workflows.
  • oma doctor CLI version probe bounded with a 1500ms spawn race + SIGTERM escalation, so a GUI binary that ignores --version can no longer hang the probe indefinitely.
  • i18n translation drift detection via oma docs i18n, plus oma docs lint for CJK em-dash anti-patterns (em-dash in CJK prose now flagged as a content-level lint, not auto-fixed).
  • Preset rename: claude-only to claude, antigravity to mixed. The -only suffix misrepresented the contract (you can already override per-agent), and antigravity collided with the runtime vendor id. Auto-migrated on oma update; hard-error on legacy values in oma-config.yaml.

What's better

  • Benchmark: oh-my-agent now scores 80.6/100 on the 5-axis multi-judge harness (Functional, Spec, Visual, Engineering, Efficiency), landing #1 against omc, superpowers, vanilla, and ecc. Multi-judge averaging runs three rounds per axis to drop single-run noise.
  • oma update prints a "What's new" note with added or removed skills and workflows after the version bump, so the catalog change is visible at upgrade time.
  • docs/generated/ auto-added to .gitignore on doc write sites via a unified cli/io/gitignore.ts module.
  • Telemetry opt-in unified across vendors: a single telemetry boolean in oma-config.yaml now drives Claude (DISABLE_TELEMETRY), Gemini, Qwen, and Codex opt-out keys on install and update.
  • cli/cli-kit/ merged into cli/utils/ so there is one home for shared CLI helpers instead of two overlapping ones.
  • Em-dash sweep across 173 files in skills, workflows, and docs per the oma-translator anti-AI-pattern rule. Em-dash usage restructured contextually with colons, periods, parens, or restructured sentences, not mechanical substitution.

Installation

# macOS / Linux
curl -fsSL https://raw.githubusercontent.com/first-fluke/oh-my-agent/main/cli/install.sh | bash
Enter fullscreen mode Exit fullscreen mode
# Windows (PowerShell)
irm https://raw.githubusercontent.com/first-fluke/oh-my-agent/main/cli/install.ps1 | iex
Enter fullscreen mode Exit fullscreen mode

Links

oh-my-agent is built for teams who orchestrate more than they prompt. Next up: closing the spec gap that surfaced in the benchmark by teaching skills the real-API plus deferred-stub pattern at scaffold time, not at fix time.

https://github.com/first-fluke/oh-my-agent

Top comments (1)

Collapse
 
foxck016077 profile image
foxck016077

The oma vault move — pulling ANTHROPIC_API_KEY and friends out of shell rc files and into the OS keychain via @napi-rs/keyring — is the kind of fix that looks obvious in hindsight but almost nobody does it until they get bitten. Agent subprocesses reading the parent shell's env is the cheapest credential-leak vector and I haven't seen another agent toolkit close it as a default. The fact that it's the opt-in path for the existing tools and the default for the vault is also the right migration shape.

Two follow-on questions from someone building agent-orchestrated workflows on a much smaller surface (single Apify Actor instead of the full oma harness):

  1. On agent:spawn --isolation=worktree: when the spawned agent makes commits inside the worktree, do you carry them as actual git commits in the parent repo's reflog (visible in git log --all), or are they ephemeral until the merge/discard decision? I'm wondering whether the worktree commits create the kind of audit trail you'd want for "what did the agent actually touch" forensics, vs. just being scratch space.

  2. On oma stats cost telemetry — input-only for now and output tokens next: do you find input-side cost is already the dominant signal for spotting runaway loops, or does the input/output ratio only become diagnostic once you have both? For Apify-side runs the failure mode is usually a producer flooding the input context with retries; output-side cost catches a different failure mode (the model rambling).

The 326 commits / 2 weeks pace is the kind of velocity that produces this level of subsystem coverage. The fact that the same harness benchmarks itself and lands at #1 is a separate honest disclosure that I respect.