DEV Community: gracefullight

oh-my-agent: cross-context reviews and trigger accuracy testing

gracefullight — Thu, 16 Jul 2026 02:01:44 +0000

When you ask a model to review its own design, it usually gives itself an A. We merged a fix for authorship bias this week.

What's new

Trigger accuracy harness: oma verify triggers now measures keyword false-fire and missed-fire rates against a 167-entry labeled corpus with zero LLM calls.
Standards emission: oma emit generates conformant Agent Skills spec folders, a .claude-plugin marketplace manifest, and an AGENTS.md index directly from the .agents single source of truth.
Skills linting: oma skills lint enforces deterministic authoring-smell checks based on Anatomy-to-Smells research.
Bundle sprawl warnings: oma skills audit now flags skills that exceed 20 reference files or 25,000 characters.

We also wired up SessionStart hooks for Claude Code to reload skills dynamically and handle session compaction without dropping context.

What's fixed

Keyword matching now ranks by specificity. Longest keyword wins, then multi-word, then declaration order. This resolves self-suppression collisions.
Relayed inter-agent messages no longer trigger false workflow keyword detections.
Windows atomicWriteJson failures (EPERM) are resolved by opening the temp file with read-write permissions.
The opencode bridge is completely rewritten to honor the new plugin API mutation contract, restoring true persistent mode.

What's better

Cross-context reviews: All 11 ultrawork review steps now dispatch to fresh reviewer subagents that receive only durable artifacts and the review guide. Same-session self-review is removed because isolated reviews catch more defects.
High-stakes blind review: Brainstorm workflows now escalate to independent subagents for design artifact critique, completely removing authorship bias.
Marketplace consolidation: The repository root .claude-plugin directory is now the single source of truth for the Claude plugin marketplace.

Installation

# macOS / Linux
curl -fsSL https://raw.githubusercontent.com/first-fluke/oh-my-agent/main/cli/install.sh | bash

# Windows (PowerShell)
irm https://raw.githubusercontent.com/first-fluke/oh-my-agent/main/cli/install.ps1 | iex

oh-my-agent is built for teams who orchestrate more than they prompt. Next up: native Antigravity plugin emission.

https://github.com/first-fluke/oh-my-agent

oh-my-agent: Angular support and stateful configuration merges

gracefullight — Thu, 09 Jul 2026 03:03:54 +0000

Shared tool configurations drift when developers run local agents. Adding a new MCP server to a team setup usually fails to reach existing local configurations, leaving developers with outdated toolsets. We resolved this in our latest CLI release by introducing stateful configuration back-filling. The update merges new servers into local environments while preserving custom developer adjustments.

What's new

Angular stack integration: Added frontend domain detection for angular.json and @angular/* packages in the /stack-set command. The oma-frontend skill now includes angular-rules.md to enforce standalone components, OnPush change detection, and signals.
API evolution patterns: Added API lifecycle patterns based on the MAP framework to oma-architecture. This includes Sajaniemi's 11 variable-role taxonomy to guide naming rules in oma-refactor.
Windows scheduling updates: The schtasks adapter now maps weekly cron ranges like 1-5 or lists like 1,3,5 directly to Windows task scheduler formats.
Model validation: Added vendor validation to the schedule:add command. The CLI now rejects unknown models at registration time rather than failing during execution.

Keeping local environments synchronized across diverse OS targets requires strict validation. These fixes ensure configuration changes flow correctly without disrupting developer-specific settings.

What's fixed

MCP server synchronization: Fixed an issue where SSOT servers added to .agents/mcp.json were only copied if .mcp.json was entirely absent. The CLI now reads the source of truth on every run and merges missing entries.
Test execution reliability: Restructured the project root resolution tests to mock the filesystem walk. This isolates test runs from ambient files on CI runners and avoids false failures.
Market diversity flags: Corrected the --diversity-threshold flag documentation to reflect that the default threshold is not enforced unless the flag is explicitly set.

Cleaning up obsolete protocols reduces token usage and limits agent distraction. Moving to automated dependency management ensures developers stay on current agent APIs without manual maintenance.

What's better

Serena auto-update defaults: Changed serena.auto_update to default to true. The system automatically upgrades the agent via uv on update commands, keeping the memories CLI fallback active.
Codebase cleanup: Removed the unused evaluator-tuning.md protocol file. Project instances are migrated automatically through database migration 016 to reduce project directory clutter.

Installation

# macOS / Linux
curl -fsSL https://raw.githubusercontent.com/first-fluke/oh-my-agent/main/cli/install.sh | bash

# Windows (PowerShell)
irm https://raw.githubusercontent.com/first-fluke/oh-my-agent/main/cli/install.ps1 | iex

oh-my-agent is built for teams who orchestrate more than they prompt. Next up: deeper telemetry integration for local agent runs.

https://github.com/first-fluke/oh-my-agent

oh-my-agent: Bluesky broadcasts and Claude workspace auto-trust

gracefullight — Thu, 02 Jul 2026 03:02:26 +0000

When you tell an agent to handle a complex backend migration, hitting a hard cutoff 20 steps in ruins the workflow. We bumped the ceiling and smoothed out the integration edges.

What's new

Bluesky syndication: Release announcements now fan out to Bluesky with rich text tags and external link cards.
Claude auto-trust: Project linking now directly mutates your ~/.claude.json to pre-accept workspace trust, skipping the interactive dialog entirely.
Serena auto-install: The Serena MCP binary now self-installs during project setup and is actively probed by the doctor command.

What's fixed

Premature cutoffs: We raised the max agent turns from 20 up to 40 for backend, frontend, and mobile implementation agents to prevent mid-task failures.
Monorepo state leaks: Running commands from inside a sub-package no longer leaves stray .agents/ directories behind. State is correctly anchored to the project root.
Fragile JSON parsing: Agent publishing steps now use jsonrepair with a 3x retry loop, preventing trailing commas or unclosed braces from aborting a release.
OpenCode dispatch: Subagents now properly inherit the user's default model instead of a hardcoded slug, and routing correctly falls back to native task tools.

What's better

Smarter MCP resolution: Serena now uses --project-from-cwd to gracefully handle execution from nested subdirectories.
Installation UX: The install flow detects native OpenCode tasks and gracefully defaults to a mixed preset when a strict single-vendor lock is unnecessary.
Context isolation: The Antigravity MCP transport now correctly stamps its own context instead of bleeding the Claude Code defaults.

Installation

# macOS / Linux
curl -fsSL https://raw.githubusercontent.com/first-fluke/oh-my-agent/main/cli/install.sh | bash

# Windows (PowerShell)
irm https://raw.githubusercontent.com/first-fluke/oh-my-agent/main/cli/install.ps1 | iex

oh-my-agent is built for teams who orchestrate more than they prompt. Next up: expanding our deepsec vulnerability scanner capabilities.

https://github.com/first-fluke/oh-my-agent

oh-my-agent: ZCode workflows and zero-prompt Claude Code trust

gracefullight — Fri, 26 Jun 2026 14:36:55 +0000

oh-my-agent CLI 10.5.0 just shipped. We added native support for ZCode and automated the workspace trust flow for Claude Code.

What's new

ZCode integration: ZCode is now registered as a workflow vendor. Workspace slash-commands in .agents/workflows/*.md are automatically symlinked to .zcode/commands/<name>.md.
Claude Code zero-prompt trust: Project-level installs surgically inject hasTrustDialogAccepted=true into ~/.claude.json. Claude Code applies your workspace permissions immediately instead of blocking on a trust dialog.
Serena binary auto-install: oma install now automatically provisions serena-agent via uv if missing. oma doctor explicitly probes the binary state.
OpenCode picker: OpenCode is available directly in the oma link picker (and you can skip selection entirely if you just want the CLI).

These integrations remove friction for teams spinning up new projects from scratch.

What's fixed

OpenCode model fallback: Removed the hardcoded model slug. OpenCode variants now inherit your configured default model.
OpenCode tools validation: Fixed a ConfigInvalidError on bootstrap by omitting empty tools arrays.
Serena subdirectory execution: Replaced the hardcoded CWD flag with --project-from-cwd so agents can run reliably from any subfolder.
Antigravity MCP context: Prevented the Claude Code context from leaking into the Antigravity runtime during tool execution.
Persistent mode crossover: Fixed a bug where untracked sessions shared state hooks under a generic unknown session ID.

We also gave agents more breathing room to complete complex tasks without premature termination.

What's better

Turn limits raised: Implementation agents (backend, frontend, mobile) now have a 40-turn cap to prevent stopping in the middle of long tasks.
Contract separation: Generated API contracts are written to docs/plans/contracts/ to keep the skill directories pristine.
Cache exclusion: The .serena/cache directory is excluded from default project scans to reduce UTF-8 decoding noise.

Installation

# macOS / Linux
curl -fsSL https://raw.githubusercontent.com/first-fluke/oh-my-agent/main/cli/install.sh | bash

# Windows (PowerShell)
irm https://raw.githubusercontent.com/first-fluke/oh-my-agent/main/cli/install.ps1 | iex

oh-my-agent is built for teams who orchestrate more than they prompt. Next up: deeper control over persistent state execution.

https://github.com/first-fluke/oh-my-agent

oh-my-agent: cross-vendor scheduling, Kimi and OpenCode land

gracefullight — Wed, 17 Jun 2026 13:34:11 +0000

Two new vendors and an OS-level scheduler merged into oh-my-agent this week, which means your agents can now run on a clock instead of only when you prompt them. 135 commits, and the theme underneath most of them is the same: stop pinning the agent to a single runtime, and stop leaking resources between sessions.

oh-my-agent is a cross-vendor harness. The point is that a workflow, a skill, or a subagent dispatch should not care which CLI is underneath it. This week pushed hard on that promise.

What's new

Kimi Code CLI is now a first-class vendor: OAuth/KIMI_API_KEY auth, TOML hook install into ~/.kimi-code/config.toml, mode-aware Serena and chrome-devtools MCP, and external dispatch via kimi -p.
OpenCode lands as an extension-class vendor with in-process plugin bridging. Subagent dispatch runs through opencode run --agent <id>, and model slugs are validated against opencode models rather than a hardcoded catalog.
oma schedule:* adds time-based recurring agent jobs that fire independently of any runtime. One SchedulerPort abstracts launchd, systemd --user, crontab, and Windows schtasks, with --cron or natural-language --every intervals.
oma serena reap kills idle language servers. Serena keeps a per-project LSP stack warm with no idle shutdown, so a few open projects pin 1.5GB or more; the reaper trims them and Serena respawns on the next tool call.
oma memory:gc prunes project-local session state and aged Serena run artifacts (defaults: keep 100 sessions, 50 days), while curated knowledge like decisions and designs is never touched.
Two new agents: refactor-engineer (budget-funded, behavior-preserving refactoring) and research-explorer (cited, trust-labeled cross-source synthesis), backed by the new oma-refactor skill.
oma-mobile gains full Flutter and React Native variants, each with a mandated repository-layer response cache (Drift offline-first, TanStack Query plus MMKV).

What's fixed

oma update no longer bulk-deletes vendor skill symlinks on every run, so a mid-update download failure can no longer leave the skills directory empty.
Updates stopped pruning skills that a shipped agent depends on, closing a gap where refactor-engineer could arrive without oma-refactor.
cleanupPeriodDays moved to the top level of Claude Code settings, where it was a no-op nested under env.
Antigravity hooks registered through the /hooks UI are now preserved instead of clobbered on link and update.

What's better

Docs verification got far less noisy: full-repo oma docs verify drops from 6,611 to 394 broken refs (and the web/docs subset from 491 to 29), so the remainder is real drift.
oma hook fires on every prompt, so it now takes an argv fast path and lazy-loads the command tree: invocation went from about 0.54s to 0.32s.
UserPromptSubmit handler budgets were retuned against a p95 of 373ms, dropping the aggregate timeout ceiling from 21s to 15s while keeping headroom over the 2s AgentMemory recall budget.
A large structural pass split all 28 non-test files over 500 lines into focused modules (largest remaining is 491) and consolidated duplicated helpers (type guards, safe writes, vendor unions, exit codes) with no behavior change.
AgentMemory recall now drops facts older than 30 days by default, so long-resolved decisions stop rehydrating into the boundary snapshot.

Gemini CLI is on its sunset path (June 18, 2026), so GEMINI.md generation and the standalone gemini preset were removed, with legacy configs soft-redirecting to antigravity.

Installation

# macOS / Linux
curl -fsSL https://raw.githubusercontent.com/first-fluke/oh-my-agent/main/cli/install.sh | bash

# Windows (PowerShell)
irm https://raw.githubusercontent.com/first-fluke/oh-my-agent/main/cli/install.ps1 | iex

Links

oh-my-agent is built for teams who run the same workflows across whichever CLI they happen to have authenticated. Next up: deeper per-agent model routing across the newly added vendors.

https://github.com/first-fluke/oh-my-agent

oh-my-agent 9.0: the explore rename, a hook ABI, and two new agents

gracefullight — Fri, 12 Jun 2026 14:13:02 +0000

oh-my-agent crossed 9.0 this week, and it carries the project's first breaking change: the agent slot formerly called retrieval is now explore. The name finally matches the work class, and migration 014 rewrites your oma-config.yaml automatically on install or update. Around that rename, 175 commits landed in seven days, taking the CLI from 8.42.0 to 9.0.2.

What's new

Two new agents: refactor-engineer owns budget-funded, behavior-preserving refactoring, and research-explorer traverses oma-search, oma-market, and oma-scholar with cited, trust-labeled synthesis.
oma-refactor skill: smell, SATD, and hotspot targeting with characterization-test safety nets. Utility eval measured a +57.1% lift over baseline across 7 tasks.
oma hook ABI: vendor hooks no longer copy and patch per-vendor bun scripts. A single oma hook --vendor <v> --event <e> entry point now dispatches for all 8 hook-model vendors, with embedded routes and fail-open semantics.
New dispatch vendors: commandcode joins the registry as an opt-in vendor, and pi (Earendil's multi-provider proxy) is now a full per-agent dispatch target via oma agent:spawn <agent> -m pi.
oma ralph:verify: the ralph workflow's anti-circumvention gate moved from prose instructions to a deterministic CLI verdict with structured JSON output and non-zero exit on failure.
serena-primer: a per-session prompt hook that reminds the model to load Serena's symbolic tools in Serena-activated projects, instead of silently falling back to grep.

The through-line is mechanical enforcement. Prose instructions get rationalized away by agents; CLI verdicts and dispatch ABIs do not.

What's fixed

Docs reference verification cut false positives from 6,611 to 394 broken refs repo-wide (491 to 29 on the docs subset) by tightening extraction rules and adding a git-backed suffix resolver.
oma update no longer prunes skills that shipped agents depend on, which previously could deliver an agent without its required skill.
Antigravity installs stop clobbering user-registered hooks in hooks.json; oma-managed entries now merge instead of overwrite.
oma hook caps its stdin read at 2 seconds with fail-open dispatch. Codex was holding the pipe open and stalling prompts for 18 to 21 seconds.
Discussing ultrawork or ralph by name no longer activates the persistent workflow; the keyword detector now distinguishes commands from mentions and compound tokens like ralph.md.
A crashed install or update lock is reclaimed in 60 seconds instead of 10 minutes when the holding pid is confirmed dead.

What's better

oma hook invocation dropped from roughly 0.54s to 0.32s user time via a lazy-loaded command tree and an argv fast path, which matters because it fires on every prompt.
All 28 non-test CLI files over 500 lines were split into focused modules with public APIs unchanged; the largest remaining file is 491 lines, verified by 3,043 passing tests.
On-disk backups consolidated from 5+ scattered conventions into a single .agents/backup/ root, covered by one gitignore line and cleared after a successful update.
A security pass replaced shell-interpolated execSync calls with argv arrays, added SSRF guards and path-containment checks, and closed an XSS vector in slide font inlining.
The cross-slice import boundary gate went from permanently red (24 false violations) to green and now gates CI.

Installation

# macOS / Linux
curl -fsSL https://raw.githubusercontent.com/first-fluke/oh-my-agent/main/cli/install.sh | bash

# Windows (PowerShell)
irm https://raw.githubusercontent.com/first-fluke/oh-my-agent/main/cli/install.ps1 | iex

Links

oh-my-agent is built for teams who orchestrate more than they prompt. Next up: deepening the pi dispatch path and graduating commandcode from opt-in.

https://github.com/first-fluke/oh-my-agent

oh-my-agent: skills now measure and optimize their own utility

gracefullight — Fri, 05 Jun 2026 05:20:04 +0000

Most skill libraries grow by accretion. You add a SKILL.md, it sounds useful, and it lives forever because nobody can prove it helps or hurts. This week oh-my-agent closed that gap: oma skills eval measures whether loading a skill actually improves held-out task outcomes, and oma skills opt rewrites the skill to push that number up. 194 commits landed, CLI is at 8.41.0, but the eval-to-opt loop is the part worth your attention.

What's new

oma skills eval: measures utilityLift (treatment vs baseline) on held-out tasks. --mock replays recorded rollouts deterministically, --live spawns two read-only agentic arms per task, --record captures the rollouts. Default checker is judge (an LLM grades output against a rubric); assert and regex are opt-in deterministic checks.
oma skills opt: an optimizer LLM proposes bounded add/delete/replace edits to a SKILL.md, re-scores each candidate through eval, and accepts only when held-out validation lift strictly improves with no negative-transfer regression (SkillOpt, arXiv:2605.23904). --dry-run is the default; --apply writes through atomic temp+rename with a .bak backup.
Negative-transfer sampling: --neg-transfer checks whether loading one skill regresses unrelated same-domain tasks from other skills' eval sets.
Scaling-law audit checks: oma skills audit now flags black-hole skills (overly generic routing hijackers) and warns past a calibrated library-size routing-decay threshold (Chen et al., arXiv:2605.16508).
oma-video skill and /video workflow: key-optional 3-tier generation (9:16 shorts, 16:9 explainer, demo capture of any URL) composing narration, visuals, captions, and a vendored Remotion compositor. Every provider degrades to a deterministic fallback, so a run completes with zero API keys.
Swift native iOS in oma-mobile: a swift-ios variant (SwiftUI + @Observable, Apple swift-openapi-generator, App/Core/Features/Shared layout). /stack-set now detects Swift, Flutter, and React Native and routes to the resolved skill; oma verify mobile runs swift build / swift test by stack manifest.
oma intel: a local-first product intelligence pipeline that collects GitHub README, releases, and issues, runs an adversarial multi-lens review gate, and splits output into a PRD and a gap report.
Three new runtimes: Kiro CLI, Pi (Earendil, via in-process .pi/extensions), and full Antigravity (agy) hook integration through .agents/hooks.json.

What's fixed

runAction clobbered positional operands: it overwrote args[0] with the merged options object, so oma state:emit decision.made '{...}' recorded the kind as {category:"main",...} and state:verify always reported the decision missing. Options are now replaced by position so operands survive.
--yes never reached handlers: the wrapper passed command.opts() (which drops globally-parsed flags), so oma skills eval --live --yes still blocked at the cost-preview prompt. Switched to optsWithGlobals(), making live skill-eval runnable in CI.
AgentMemory leaked into project dirs: the iii engine wrote a cwd-relative ./data/ store into whatever project launched it. The daemon cwd is now pinned to ~/.agentmemory, and daemon stop invokes agentmemory stop so no orphaned engine keeps port 3111.
Keyless market sources silently 403'd: anonymous reddit search.json and bluesky's public search endpoint both returned 403, dropping two of the default sources. Reddit now routes through pullpush.io, bluesky through api.bsky.app, taking keyless default coverage from 2/4 to 4/4.
agy headless stdout was empty: Antigravity emits nothing on stdout under --print against a non-TTY, so spawned subagent capture was blank. Subagents now run under a PTY (script(1)) so their output is captured.

What's better

Workflows are symlinked directly: each workflow file carries its own name + disable-model-invocation frontmatter and is exposed by symlinking straight at .agents/workflows/<wf>.md. This removed 18 committed wrapper skills and fixed a pdf/oma-pdf audit false positive; the real skill count is now 30.
harvest.ts split: the 1.4k-line market harvest file became endpoints / normalizers / sources modules, and a ~400-line fetchSource conditional became a source-handler registry. The public facade is unchanged.
Print stylesheet stopped fighting the cascade: the slide PDF export dropped its avoidable !important overrides by fixing the source of the conflict (scoped #slide-NN resets emitted after author styles) instead of forcing the win. The only remaining !important is the prefers-reduced-motion a11y reset.
Centralized paths and hashing: install, state, and recap now share .agents path constants and agree on full SHA-256 for manifest checksums.
Default effort lowered xhigh to high for install-time Claude settings and the Anthropic auto-default; existing higher settings are preserved.

Installation

# macOS / Linux
curl -fsSL https://raw.githubusercontent.com/first-fluke/oh-my-agent/main/cli/install.sh | bash

# Windows (PowerShell)
irm https://raw.githubusercontent.com/first-fluke/oh-my-agent/main/cli/install.ps1 | iex

oh-my-agent is built for teams who treat a skill library as a measured asset, not a junk drawer. Next up: feeding oma skills opt accepted edits back through the eval fixtures so the library self-tunes on every release.

https://github.com/first-fluke/oh-my-agent

oh-my-agent: Antigravity becomes a first-class vendor

gracefullight — Tue, 26 May 2026 11:28:02 +0000

Antigravity's agy CLI just landed as a first-class vendor in oh-my-agent, three weeks before Google retires the Gemini CLI for unpaid tiers on June 18, 2026. If your model_preset still points at gemini, the CLI now warns you at runtime and on every oma update so you can switch before requests start failing.

That migration set the theme for the week: 139 commits, cli shipped through 8.9.0, and the vendor layer grew a real backbone.

What's new

Antigravity (agy) as a first-class vendor: a dedicated auth probe, two model entries (gemini-3.1-pro, gemini-3.5-flash), a real built-in preset, native dispatch via agy --dangerously-skip-permissions -p, a doctor probe, an install-picker option, MCP config, HUD wiring, keyless image generation, and a recap transcript parser.
Global install mode: oma install --global writes to ~/.agents/ instead of the project tree, backed by an install-context singleton that resolves OMA_HOME > --global > cwd, a sudo-refusal guard, WSL guidance, and an atomic install lock.
oma uninstall [--global] with a dry-run preview that partitions oma-owned files from user-authored content and never touches the latter.
oma skills audit: a zero-dependency TF-IDF boundary check that flags confusable skill descriptions in a warn band (>=0.60) and a fail band (>=0.75), surfaced as a warning inside oma doctor.
L1 state snapshots: a new oma state command plus state-boundary, state-emit, and decision-verifier hooks that enforce decision snapshots during a run.
Cursor as a vendor: plugin and marketplace manifests, Composer transcript ingestion for recap, auth:status wiring, and composer-2.5 presets.

What's fixed

npm publish had been running unauthenticated and 404ing on the 8.x releases. The publish step now sets both NODE_AUTH_TOKEN and NPM_CONFIG_TOKEN so neither bun code path can fall back to an empty token.
Windows CI hung inside vitest on 13 of the last 15 runs. We capped the job at 20 minutes and skipped the vitest step on Windows (lint, typecheck, build, and the PowerShell installer smoke test still run) while we bisect the hanging test.
verify's closure check reported real artifacts as missing because Bun's globSync does not match through hidden segments like .agents/. It now uses a deterministic readdir-based segment walker.
Forced LF line endings repo-wide via .gitattributes, so Windows checkouts stop tripping the Biome format check.

What's better

link() is now the single vendor reconciliation kernel. oma install, oma update, and oma link each duplicated the per-vendor write pipeline; consolidating it removed roughly 110 lines from update.ts and 130 from install.ts, and closed a drift where the Antigravity HUD was never wired during normal install or update.
Vendor functions follow one convention now: apply<Vendor><Subject> for writes, needs<Vendor><Subject>Update for idempotency guards.
New safeWriteJson / safeReadJson helpers do atomic write-and-rename with backup rotation, plus a guard that refuses to clobber vendor state files like ~/.claude.json.
Workflow skills are unified under one SSOT at .agents/skills/ (Migration 011); vendor surfaces are symlinks into it instead of copies.
The Windows installer (install.ps1) now bootstraps serena via uv tool, matching the Unix path.

Installation

# macOS / Linux
curl -fsSL https://raw.githubusercontent.com/first-fluke/oh-my-agent/main/cli/install.sh | bash

# Windows (PowerShell)
irm https://raw.githubusercontent.com/first-fluke/oh-my-agent/main/cli/install.ps1 | iex

oh-my-agent is built for teams who run more than one agent vendor and orchestrate more than they prompt. Next up: bisecting the Windows vitest hang so the full suite runs on every OS again.

https://github.com/first-fluke/oh-my-agent

oh-my-agent: five new skills land, plus vault and worktree isolation

gracefullight — Mon, 18 May 2026 07:20:55 +0000

326 commits over the last two weeks. The headline: five new skills shipped, secret management moved off shell rc files, and the project landed at #1 on its own 5-axis harness benchmark with 80.6/100.

What's new

Five new skills: oma-market (deterministic market research pipeline across keyless sources), oma-docs (doc drift detection + sync), oma-deepsec (drives Vercel's vulnerability scanner end-to-end), oma-academic-writer (publication-grade prose), oma-voice (local TTS/STT via Voicebox MCP).
oma vault: OS keychain credential store backed by @napi-rs/keyring. Stops ANTHROPIC_API_KEY and friends from sitting in shell rc files where any agent subprocess can read them.
agent:spawn --isolation=worktree: opt-in git worktree per spawned agent. Worktrees are retained on exit; the spawner prints merge or discard commands so you decide.
oma stats cost telemetry: per-vendor token breakdown with USD estimates from a conservative pricing floor. Input-only for now, output tokens tracked next.
oma model:check / probe / propose: detect model registry drift against live vendor sources without API keys, then emit a models.yaml patch for accepted candidates.
Cursor as a first-class vendor: cursor/composer-2, composer-2-fast, and auto register with native dispatch. New cursor-only preset (also renamed to plain cursor, see below).
Three slash workflows: /docs for drift verify and sync, /recap for daily and period summaries, /deepsec for the full scan to triage loop.
PWA on the docs site: manifest with maskable icons, install screenshots, and display_override for desktop title-bar customization.

What's fixed

Serena MCP migrated from ephemeral uvx --from git+... to globally installed serena-agent via uv tool install, with vendor-specific --context per the upstream client matrix.
Hook keyword detector hardened against three bypass classes: NFKC normalization for fullwidth Latin from CJK IMEs, hyphen-rejecting word boundaries, and a two-tier CLI invocation guard so prompts that ARE CLI invocations no longer trigger workflows.
oma doctor CLI version probe bounded with a 1500ms spawn race + SIGTERM escalation, so a GUI binary that ignores --version can no longer hang the probe indefinitely.
i18n translation drift detection via oma docs i18n, plus oma docs lint for CJK em-dash anti-patterns (em-dash in CJK prose now flagged as a content-level lint, not auto-fixed).
Preset rename: claude-only to claude, antigravity to mixed. The -only suffix misrepresented the contract (you can already override per-agent), and antigravity collided with the runtime vendor id. Auto-migrated on oma update; hard-error on legacy values in oma-config.yaml.

What's better

Benchmark: oh-my-agent now scores 80.6/100 on the 5-axis multi-judge harness (Functional, Spec, Visual, Engineering, Efficiency), landing #1 against omc, superpowers, vanilla, and ecc. Multi-judge averaging runs three rounds per axis to drop single-run noise.
oma update prints a "What's new" note with added or removed skills and workflows after the version bump, so the catalog change is visible at upgrade time.
docs/generated/ auto-added to .gitignore on doc write sites via a unified cli/io/gitignore.ts module.
Telemetry opt-in unified across vendors: a single telemetry boolean in oma-config.yaml now drives Claude (DISABLE_TELEMETRY), Gemini, Qwen, and Codex opt-out keys on install and update.
cli/cli-kit/ merged into cli/utils/ so there is one home for shared CLI helpers instead of two overlapping ones.
Em-dash sweep across 173 files in skills, workflows, and docs per the oma-translator anti-AI-pattern rule. Em-dash usage restructured contextually with colons, periods, parens, or restructured sentences, not mechanical substitution.

Installation

# macOS / Linux
curl -fsSL https://raw.githubusercontent.com/first-fluke/oh-my-agent/main/cli/install.sh | bash

# Windows (PowerShell)
irm https://raw.githubusercontent.com/first-fluke/oh-my-agent/main/cli/install.ps1 | iex

Links

oh-my-agent is built for teams who orchestrate more than they prompt. Next up: closing the spec gap that surfaced in the benchmark by teaching skills the real-API plus deferred-stub pattern at scaffold time, not at fix time.

https://github.com/first-fluke/oh-my-agent

oh-my-agent: 9 new skills, cursor as first-class vendor, 80/100 benchmark

gracefullight — Mon, 11 May 2026 14:39:20 +0000

When you tell an agent to scaffold a Next.js app, it picks the wrong version, ignores your lint config, and ships a save button with no storage backing it. The last four weeks of oh-my-agent were spent fixing exactly that, plus shipping a benchmark that actually measures it.

What's new

Nine new skills: oma-deepsec (Vercel deepsec driver), oma-docs (doc reference drift detection), oma-observability (33 files routing MELT+P signals across L3/L4/mesh/L7), oma-academic-writer, oma-hwp (HWP/HWPX to Markdown via kordoc), oma-image (multi-vendor generation across codex, pollinations, gemini), oma-scholar (Knows sidecar paper records), oma-search (intent-based search with trust scoring), oma-skill-creator
cursor promoted to a first-class vendor with cursor-only preset, composer-2 routing, and --yolo auto-approve
New /docs and /deepsec workflows, both detected via keyword triggers in 11 languages
oma model:check, model:probe, model:propose diff the local registry against OpenRouter and cursor agent --list-models, then scaffold a models.yaml patch you can paste in
Auto-update CLI on outdated install, gated by auto_update_cli in oma-config.yaml
Windows support via install.ps1, with junction and hardlink fallbacks when symlinks raise EPERM

What's fixed

Hook keyword detector: NFKC normalization closes fullwidth IME bypasses (ｐａｒａｌｌｅｌ no longer slips past), hyphen-rejecting word boundary, two-tier CLI invocation guard so claude review this code still routes to workflows but claude exec --foo does not
codex exec -i is variadic in clap and was swallowing the prompt as a second reference image. Argv now terminates with -- before the instruction
Path traversal closed on --out and --reference paths; magic-byte MIME validation on reference images
Windows path separators normalized across oma doctor, oma link, and the gitignore IO layer

What's better

New 5-axis benchmark in benchmarks/: Functional (35), Spec (15), Visual (20), Engineering (20), Efficiency (10). oh-my-agent scored 80.6 first place, ahead of omc 74.1, superpowers 72.9, vanilla 70.7, ecc 70.2. Multi-judge averaging across 3 rounds, not single-shot
Config consolidated to one oma-config.yaml with model_preset and inline agents map. Five built-in presets ship in the CLI; legacy agent_cli_mapping auto-migrates on install via migration 008
1492 tests passing, CI matrix now covers ubuntu/macos/windows with pre-push and pre-commit hooks
oma docs i18n detects translation drift across 11 locales; oma docs lint flags em-dashes and wrong-language placeholders. Used internally to restructure ~1900 em-dashes across 173 files per CJK typography conventions
Skills auto-register from .agents/skills/ frontmatter at build time, so adding a skill no longer requires touching a hardcoded SKILLS map

Installation

# macOS / Linux
curl -fsSL https://raw.githubusercontent.com/first-fluke/oh-my-agent/main/cli/install.sh | bash

# Windows (PowerShell)
irm https://raw.githubusercontent.com/first-fluke/oh-my-agent/main/cli/install.ps1 | iex

oh-my-agent is built for teams running parallel agents across multiple vendor CLIs and getting tired of each one drifting. Next up: deeper PR-gate integration for /deepsec and a reciprocal-skill pass on oma-observability.

https://github.com/first-fluke/oh-my-agent

oh-my-agent is Now Official on Homebrew-core: A New Milestone for Multi-Agent Orchestration

gracefullight — Thu, 16 Apr 2026 10:42:14 +0000

Are you tired of searching for effective agent teams and skills only to find they don't support your favorite AI IDE? Now is the time to use oh-my-agent, and it is now more accessible than ever.

I am happy to announce that oh-my-agent (OMA) has officially merged into the Homebrew-core repository. This milestone marks the transition of OMA from a specialized developer tool to a globally recognized, first-class CLI utility.

Elevating the CLI Experience

While my previous writing focused on how OMA functions as a harness within AI-powered IDEs, this Homebrew release highlights the power of the OMA CLI (oma) as a standalone engine.

A global installation provides a unified interface to manage your AI engineering team:

Universal Agent Spawning: Trigger specialized agents (Backend, QA, Architecture) directly from the terminal, regardless of which editor or AI IDE you are currently using.
Real-time Monitoring: Use oma dashboard to observe agent reasoning, tool calls, and progress in a dedicated terminal UI.
Environment Diagnostics: The oma doctor utility ensures your entire multi-vendor stack (Claude Code, Gemini CLI, etc.) is healthy and properly configured.

This update ensures that whether you are working in a team or across multiple machines, your AI specialists are always just one command away.

Explore the project on GitHub: https://github.com/first-fluke/oh-my-agent

I look forward to seeing how this expanded accessibility helps developers build more robust and reliable AI-driven workflows.

oh-my-agent — A Production-Grade Multi AI IDE Agent Harness

gracefullight — Wed, 25 Mar 2026 08:22:17 +0000

When you tell an agent to "build a TODO app," it does build something. The problem is that it often builds the wrong thing, drifts out of scope, and repeats the same mistakes.

To address this, structural approaches like AGENTS.md and, more recently, Skills have emerged. But looking at the skills actually being shared, a few recurring problems stand out:

The most critical piece — library version information — is missing.
Role descriptions end at hollow declarations like "You are a Senior engineer."
Content that could be covered by a few keywords gets padded into lengthy prose, wasting tokens.

As a result, these skills are poorly followed by models, burn context for nothing, and over time become dead code that nobody wants to open.

[Approach]

With oh-my-agent, we wanted to solve this through process, not prompts. Instead of simply telling the agent to "redo it" when something goes wrong, we record why it went wrong and feed that back into the next run.

The core mechanism is Clarification Debt (CD) Scoring. When the agent misinterprets a requirement or drifts out of scope, points accumulate:

clarify: +10 — simple confirmation question
correct: +25 — direction change due to misunderstood intent
redo: +40 — rollback and restart due to scope deviation
Starting work without checking the Charter: +15
Modifying files outside the allowed scope: +20
Repeating the same error: x1.5 multiplier

Above 50 points, writing a Root Cause Analysis (RCA) is mandatory. Above 80, the session is halted. Lessons extracted are accumulated in lessons-learned.md and reflected from the very next session. Even with simple prompts, the process compensates.

Beyond that, several common protocols keep the agent from going rogue:

Clarification Protocol — Requirement ambiguity is classified as LOW / MEDIUM / HIGH. LOW means proceed, MEDIUM means present options, HIGH means stop and clarify first.

Difficulty Guide — Tasks are categorized as Simple / Medium / Complex, adjusting the required protocol depth accordingly.

Context Budget — Token budgets are set per model to reduce unnecessary context consumption.

This approach aligns with the Harness Engineering concept discussed by OpenAI. Getting the most out of agents isn't a one-liner prompt problem — it's about what control structure you wrap around them.

[Project Structure]

oh-my-agent manages all of this within the project directory.

.agents/ = SSOT — Skills, workflows, and configurations live under .agents/ as the single source of truth. No dependency on any specific IDE.

Role-based agent team — Core roles include PM, QA, Frontend, Backend, Mobile, and Debug, with DB Agent and TF Infra Agent newly added.

DB Agent: SQL / NoSQL / Vector DB modeling, including ISO 27001 security recommendations
TF Infra Agent: Multi-cloud Terraform, OPA / Sentinel policies, ISO 42000 series control guidance

Workflow-centric orchestration — Planning, review, debug, and parallel execution form the default flow. The newly added /brainstorm workflow explores design before writing code: codebase analysis → clarification questions → approach proposal → user approval → design document saved, then followed by /plan → implementation.

[Two Orchestration Modes]

/coordinate is built for speed — iterate fast, fix problems as they surface. The PM breaks down tasks, dispatches agents, and QA runs a single review pass. If CRITICAL/HIGH issues appear, the affected task is re-run. It's a lightweight 7-step loop.

/ultrawork emphasizes quality gates. It's divided into five phases — PLAN → IMPL → VERIFY → REFINE → SHIP — each with a gate that blocks progression until passed. Of the 17 steps, 11 are reviews. The REFINE phase handles file splitting, deduplication, side-effect analysis, and dead code removal.

It might seem like overkill, but as programming abstraction climbs from machine language to high-level languages and now to natural language, verification only becomes more critical — a point that's hard to argue with.

[Expansion Background]

A month ago, this project launched as oh-my-ag, an orchestrator exclusive to Antigravity. Since then, multiple AI IDEs started adopting .agents/skills/ as the project skill path, and there was no longer a reason to keep it locked to a single IDE. So it was expanded into a universal harness format and became oh-my-agent.

[Getting Started]

curl -fsSL https://raw.githubusercontent.com/first-fluke/oh-my-agent/refs/heads/main/cli/install.sh | bash

Supports all major AI IDEs: Antigravity, Claude Code, Codex CLI, Cursor, and more.

If you're already using an AI IDE, give it a try. At the end of the day, the developer's goal is to hit QCD (Quality, Cost, Delivery) all at once. Agent-driven development is no exception — and that's the mindset behind this project.

🔗 GitHub: first-fluke/oh-my-agent