wshobson/agents: 5 Hidden Uses of the 37K-Star Multi-Harness Agentic Plugin Marketplace

Most AI coding assistants force you into a single ecosystem. Claude Code has its skills, Codex has its rules, Cursor has its own format, and Gemini CLI expects something entirely different. Maintaining agent configurations across five platforms means rewriting the same logic five times — until now.

wshobson/agents is a 37,106-star open-source multi-harness agentic plugin marketplace that lets you write once and deploy everywhere. It ships 84 plugins, 192 specialized agents, 156 skills, and 102 slash commands across Claude Code, OpenAI Codex CLI, Cursor, OpenCode, Gemini CLI, and GitHub Copilot — all from a single Markdown source.

In 2026, the AI coding tool landscape has fractured. Each platform uses different formats for agents, skills, and commands. This marketplace solves that fragmentation problem with a source-of-truth architecture: you maintain one plugins/ directory and per-harness adapters generate native artifacts for each platform automatically.

Hidden Use #1: Multi-Harness Plugin Marketplace from a Single Source

What most people do: Maintain separate agent configurations for each AI coding tool — CLAUDE.md for Claude Code, AGENTS.md for Codex, .cursor/rules for Cursor, and so on. Each platform gets its own hand-tuned agent definitions, skills, and commands.

The hidden trick: wshobson/agents uses a single plugins/ directory as the source of truth. Each plugin contains agents, skills, and commands in portable Markdown. Adapters then generate harness-native artifacts for all six supported platforms.

# Clone the marketplace
gh repo clone wshobson/agents ~/agents && cd ~/agents

# Install for Claude Code (native — reads plugins/ directly)
/plugin marketplace add wshobson/agents
/plugin install python-development

# Generate for all six harnesses from one source
make generate-all

# Validate structural integrity across all outputs
make validate

The result: One plugins/python-development/ directory produces native Claude Code plugins, Codex CLI skills, Cursor rules, OpenCode agents, Gemini CLI extensions, and Copilot instructions — zero duplication.

Data sources: wshobson/agents GitHub 37,106 Stars, 4,003 Forks, MIT License, Python, last pushed 2026-06-22.

Hidden Use #2: Three-Layer Plugin Quality Evaluation (PluginEval)

What most people do: Install agent skills and plugins based on star count or description, with no objective quality signal. A skill might look good in its README but fail in production.

The hidden trick: The marketplace includes PluginEval — a three-layer evaluation framework that scores plugins across 10 quality dimensions, from static structural analysis to LLM-based semantic judging to Monte Carlo statistical simulation.

# Install PluginEval dependencies
cd plugins/plugin-eval
uv sync --extra llm

# Quick static analysis (< 2 seconds, free)
uv run plugin-eval score path/to/skill --depth quick

# Standard evaluation: static + LLM judge (~30s, 4 LLM calls)
uv run plugin-eval score path/to/skill --depth standard

# Deep evaluation: all 3 layers including Monte Carlo (~2 min)
uv run plugin-eval score path/to/skill --depth deep

# CI gate: fail if quality below threshold
uv run plugin-eval score path/to/skill --threshold 70

# Full certification with badge assignment
uv run plugin-eval certify path/to/skill

The result: You get a calibrated quality score with confidence intervals, anti-pattern detection, and a letter grade (Bronze through Platinum). The CI gate prevents low-quality plugins from reaching production.

Data sources: PluginEval framework documented in wshobson/agents repository, three-layer architecture (Static <2s/free, LLM Judge ~30s, Monte Carlo ~2min).

Hidden Use #3: Tiered Model Strategy for Cost-Optimized Agents

What most people do: Run every agent task on the most expensive model (Opus or GPT-4o) regardless of complexity. A documentation generation task gets the same model as a security review.

The hidden trick: wshobson/agents implements a 5-tier model routing strategy within each agent definition. Tasks are routed to the cheapest model that meets quality requirements.

# From an agent definition in plugins/:
Tier  | Model   | Use Case
------|---------|----------------------------------------------
0     | Fable   | Longest-horizon autonomous work (premium)
1     | Opus    | Architecture, security, code review
2     | inherit | User-chosen (backend, frontend, AI/ML)
3     | Sonnet  | Docs, testing, debugging, API references
4     | Haiku   | Fast operational tasks, SEO, deployment

# Install a plugin that uses tiered routing
/plugin install code-review

# The agent automatically routes:
# - Security review → Opus (Tier 1)
# - Test generation → Sonnet (Tier 3)
# - Formatting checks → Haiku (Tier 4)
# Cost savings: ~60-80% vs running everything on Opus

The result: Production-quality outputs for critical tasks while routine work runs on faster, cheaper models. The tiering is defined per-agent in the plugin's frontmatter, so you control the cost-quality tradeoff.

Data sources: wshobson/agents README documents 5-tier model strategy (Fable/Opus/inherit/Sonnet/Haiku) with specific use cases per tier.

Hidden Use #4: Cross-Harness Graceful Degradation

What most people do: Assume features that work in one platform will work in all others. You write a Claude Code agent with TodoWrite and Task tools, then discover Codex and Cursor don't support those tools.

The hidden trick: Each harness adapter in wshobson/agents handles incompatibilities mechanically. The system knows exactly which capabilities degrade on which platform and transforms them automatically.

# From tools/adapters/capability_matrix.py
# Source pattern → per-harness degradation:

# "tools: Read, Grep" (agent allowlist)
#   Codex → dropped, sandbox_mode = "read-only" heuristic
#   Cursor → dropped (not honored)
#   OpenCode → converted to permission: deny block
#   Gemini → passed through

# "model: opus" (agent)
#   Codex → mapped to gpt-5.5
#   Cursor → rewritten to inherit
#   OpenCode → rewritten to anthropic/claude-opus-4-8
#   Gemini → mapped to gemini-2.5-pro

# Skill body > 8 KB
#   Codex → split into references/details.md
#   Others → passed through (no limit)

# Generate all harness artifacts with automatic degradation
make generate-all

# Check what changed for a specific harness
make generate HARNESS=codex

# Validate all outputs are structurally valid
make validate

The result: You write one agent definition and the adapters handle the translation. Claude Code's TodoWrite stays as-is on platforms without it (no equivalent — leave as-is). Model aliases map to each platform's native naming. Skill bodies over 8 KB auto-split for Codex.

Data sources: wshobson/agents docs/harnesses.md documents the full capability matrix and graceful degradation rules across all 6 harnesses.

Hidden Use #5: Orchestrator Workflows for Multi-Agent Coordination

What most people do: Install individual plugins and manually chain them — run the security scanner, then the test generator, then the documentation builder in sequence.

The hidden trick: The marketplace includes 16 orchestrator workflows that coordinate multiple agents with defined handoffs: full-stack development, security audits, ML pipelines, and incident response.

# List available orchestrators
ls plugins/ | grep orchestrator

# Example: security-orchestrator
# 1. security-analyzer scans codebase
# 2. vulnerability-prioritizer ranks findings
# 3. fix-generator creates patches
# 4. test-validator verifies fixes don't break tests

# Install the orchestration
/plugin install security-orchestrator

# Run it (the orchestrator handles agent handoffs)
/Security-orchestrator --target ./src --depth thorough

# From the orchestrator definition:
# Each orchestrator specifies:
# - agents: [list of agent IDs to coordinate]
# - handoff_strategy: sequential | parallel | conditional
# - max_retries: 3
# - fallback_model: sonnet (if primary fails)

# The marketplace ships these orchestrators:
# - full-stack-development (12 agents)
# - security-audit (8 agents)
# - ml-pipeline (6 agents)
# - incident-response (10 agents)
# - code-review-suite (5 agents)
# ... and 11 more

The result: Instead of manually chaining 5-10 agents, you run one orchestrator that handles the sequence, retries failed steps, and produces a unified report. The 16 orchestrators cover the most common production workflows.

Data sources: wshobson/agents README documents 16 orchestrators for multi-agent coordination workflows.

Summary

Multi-Harness from Single Source — Write plugins once, deploy to 6 AI coding platforms automatically
PluginEval Quality Framework — Three-layer evaluation (static + LLM judge + Monte Carlo) with CI gates
Tiered Model Strategy — Route tasks to appropriate model tiers, saving 60-80% on API costs
Cross-Harness Graceful Degradation — Automatic capability translation between platforms
16 Orchestrator Workflows — Pre-built multi-agent coordination for security, ML, and dev pipelines

What's your most creative use of multi-agent orchestration? Have you built a plugin marketplace for your team? Share your approach in the comments — I'd love to hear how you're solving the cross-platform agent problem in 2026.