Most skill libraries grow by accretion. You add a SKILL.md, it sounds useful, and it lives forever because nobody can prove it helps or hurts. This week oh-my-agent closed that gap: oma skills eval measures whether loading a skill actually improves held-out task outcomes, and oma skills opt rewrites the skill to push that number up. 194 commits landed, CLI is at 8.41.0, but the eval-to-opt loop is the part worth your attention.
What's new
-
oma skills eval: measuresutilityLift(treatment vs baseline) on held-out tasks.--mockreplays recorded rollouts deterministically,--livespawns two read-only agentic arms per task,--recordcaptures the rollouts. Default checker isjudge(an LLM grades output against a rubric);assertandregexare opt-in deterministic checks. -
oma skills opt: an optimizer LLM proposes bounded add/delete/replace edits to aSKILL.md, re-scores each candidate through eval, and accepts only when held-out validation lift strictly improves with no negative-transfer regression (SkillOpt, arXiv:2605.23904).--dry-runis the default;--applywrites through atomic temp+rename with a.bakbackup. -
Negative-transfer sampling:
--neg-transferchecks whether loading one skill regresses unrelated same-domain tasks from other skills' eval sets. -
Scaling-law audit checks:
oma skills auditnow flags black-hole skills (overly generic routing hijackers) and warns past a calibrated library-size routing-decay threshold (Chen et al., arXiv:2605.16508). -
oma-videoskill and/videoworkflow: key-optional 3-tier generation (9:16 shorts, 16:9 explainer, demo capture of any URL) composing narration, visuals, captions, and a vendored Remotion compositor. Every provider degrades to a deterministic fallback, so a run completes with zero API keys. -
Swift native iOS in
oma-mobile: aswift-iosvariant (SwiftUI +@Observable, Appleswift-openapi-generator, App/Core/Features/Shared layout)./stack-setnow detects Swift, Flutter, and React Native and routes to the resolved skill;oma verify mobilerunsswift build/swift testby stack manifest. -
oma intel: a local-first product intelligence pipeline that collects GitHub README, releases, and issues, runs an adversarial multi-lens review gate, and splits output into a PRD and a gap report. -
Three new runtimes: Kiro CLI, Pi (Earendil, via in-process
.pi/extensions), and full Antigravity (agy) hook integration through.agents/hooks.json.
What's fixed
-
runActionclobbered positional operands: it overwroteargs[0]with the merged options object, sooma state:emit decision.made '{...}'recorded the kind as{category:"main",...}andstate:verifyalways reported the decision missing. Options are now replaced by position so operands survive. -
--yesnever reached handlers: the wrapper passedcommand.opts()(which drops globally-parsed flags), sooma skills eval --live --yesstill blocked at the cost-preview prompt. Switched tooptsWithGlobals(), making live skill-eval runnable in CI. -
AgentMemory leaked into project dirs: the iii engine wrote a cwd-relative
./data/store into whatever project launched it. The daemon cwd is now pinned to~/.agentmemory, anddaemon stopinvokesagentmemory stopso no orphaned engine keeps port 3111. -
Keyless market sources silently 403'd: anonymous reddit
search.jsonand bluesky's public search endpoint both returned 403, dropping two of the default sources. Reddit now routes through pullpush.io, bluesky throughapi.bsky.app, taking keyless default coverage from 2/4 to 4/4. -
agy headless stdout was empty: Antigravity emits nothing on stdout under
--printagainst a non-TTY, so spawned subagent capture was blank. Subagents now run under a PTY (script(1)) so their output is captured.
What's better
-
Workflows are symlinked directly: each workflow file carries its own
name+disable-model-invocationfrontmatter and is exposed by symlinking straight at.agents/workflows/<wf>.md. This removed 18 committed wrapper skills and fixed apdf/oma-pdfaudit false positive; the real skill count is now 30. -
harvest.tssplit: the 1.4k-line market harvest file became endpoints / normalizers / sources modules, and a ~400-linefetchSourceconditional became a source-handler registry. The public facade is unchanged. -
Print stylesheet stopped fighting the cascade: the slide PDF export dropped its avoidable
!importantoverrides by fixing the source of the conflict (scoped#slide-NNresets emitted after author styles) instead of forcing the win. The only remaining!importantis theprefers-reduced-motiona11y reset. -
Centralized paths and hashing: install, state, and recap now share
.agentspath constants and agree on full SHA-256 for manifest checksums. - Default effort lowered xhigh to high for install-time Claude settings and the Anthropic auto-default; existing higher settings are preserved.
Installation
# macOS / Linux
curl -fsSL https://raw.githubusercontent.com/first-fluke/oh-my-agent/main/cli/install.sh | bash
# Windows (PowerShell)
irm https://raw.githubusercontent.com/first-fluke/oh-my-agent/main/cli/install.ps1 | iex
oh-my-agent is built for teams who treat a skill library as a measured asset, not a junk drawer. Next up: feeding oma skills opt accepted edits back through the eval fixtures so the library self-tunes on every release.
Top comments (0)