If you've been building with AI coding agents over the past year, you've probably discovered Agent Skills. The premise is elegant: package your workflows as portable, version-controlled skill folders that any agent can pick up and use. Write once, use everywhere.
I've been all-in on it. I opensourced ram-agent-skills — a curated collection of 15 skills blending the Google Conductor workflow with Matt Pocock's agentic patterns. Skills like conductor, grill-me, to-prd, and tdd that I use every day.
But after months of real-world usage, I hit a wall. And I think everyone building serious skills will hit it too.
The problem: skills can't express their human-in-the-loop needs
Here's what I mean. Take the conductor skill. It's a structured development workflow — it generates a spec, gets approval, builds an implementation plan, gets approval again, implements phase by phase, and asks for manual verification before each commit.
Every one of those approval steps requires a human. And today, the only way a skill can ask for human input is to write something like this in its instructions:
"Ask the user for the project name, tech stack, and goal before proceeding."
The agent then improvises a question. The tool renders it however it feels like. The human answers in freeform text. The agent parses whatever comes back.
That's four failure points in a workflow that's supposed to be rigorous.
The result:
- The same skill feels completely different in Claude Code vs Cursor vs Copilot
- Required inputs can be skipped with no validation
- There's no way to express how to ask — a confirmation dialog vs a multi-select vs a ranked priority list
- Every tool that implements the spec reinvents human input gathering from scratch
A skill today cannot communicate its UI needs for gathering human input in a generic, tool-agnostic way.
That's the gap.
What the fix looks like
I've been working on a proposal to add a human-interactions field to the Agent Skills spec. The core idea is simple:
A skill adds a single 2-token signal to its SKILL.md frontmatter:
---
name: conductor
description: Structured development track management...
human-interactions: true
---
And a new references/INTERACTIONS.md file holds the full schema — loaded only when the skill is activated, not at startup. This is deliberate. Context bloat killed MCP's usability. We're not repeating that mistake.
The schema lets skill authors declare what to collect and when. The tool decides how to render it.
- id: project-setup
trigger: before-start
title: "Set up your conductor track"
description: "A few details before conductor scaffolds your project"
on-skip: abort
fields:
- id: track-type
type: single-select
label: "What kind of work is this?"
required: true
options: [Feature, Bug fix, Refactor, Spike]
- id: scope-notes
type: textarea
label: "Describe the goal"
required: true
placeholder: "What problem does this solve?"
The same schema — rendered as a rich form in Claude.ai, sequential prompts in a terminal, a chat card in Copilot. Skill authors write intent. Tools handle presentation.
Four trigger types covering 80% of cases
The proposal defines four triggers, each mapping to a real pattern in workflow-heavy skills:
before-start — fires once before execution begins. The place for required context the agent can't infer on its own. Stack, goal, priorities. Collected before the skill body even loads.
on-phase — fires at a named phase boundary. In conductor, this is "spec done — approve before planning" and "plan done — approve before implementing." The phase name is freeform, so it maps to whatever workflow structure your skill uses.
on-demand — agent-initiated. The skill body tells the agent when to invoke it. Used for mid-execution clarification when something ambiguous comes up.
on-confirmation — blocks on explicit human approval before a destructive or irreversible action. With on-skip: abort (the default), skipping halts the skill cleanly — it never silently continues.
Dynamic options: skills that know your codebase
Static option lists are fine for generic skills. But real workflow skills need to meet the user where they are.
The proposal supports dynamic option sources alongside static arrays:
- id: tech-stack
type: multi-select
label: "Detected tech stack — confirm or adjust"
options:
source: file
path: package.json
extract: "frameworks and runtimes from dependencies"
fallback: [React, Vue, Node.js, Python, Go, Other]
Two sources are proposed for v1:
-
file— read a file already in the repo, agent extracts the relevant options using theextracthint. No code execution, no new attack surface. -
agent— agent infers options from context it already has. Used for things like "which modules in src/ are relevant to this change?"
Script-based dynamic options (running a bundled script to generate options) are explicitly deferred pending a proper sandboxed execution security model. Naming the deferral is important — it keeps the proposal clean without closing the door.
What this means for skill authors
If you're building skills today, this changes how you think about the human-in-the-loop parts of your workflows. Instead of writing:
"Ask the user to confirm before committing."
You write a structured checkpoint that every tool can render correctly:
- id: commit-confirm
trigger: on-confirmation
title: "Manual verification — ready to commit?"
on-skip: abort
fields:
- id: verified
type: confirm
label: "Verification passed. Commit and continue?"
required: true
The skill becomes a real contract — not just for the agent, but for the human too.
The reference implementation
To make this concrete, I've updated the conductor skill in ram-agent-skills to use human-interactions, and added a new human-interaction-demo skill that's a fully annotated walkthrough of every feature in the spec — every trigger type, every field type, both static and dynamic options, with comments explaining each choice.
The branch is here: ram-agent-skills/tree/human-interactions-rfc
Join the discussion
This is a draft RFC, not a finished spec. The goal is to get the community's eyes on it before it hardens.
The RFC is open for discussion at agentskills/agentskills#413. A few open questions worth weighing in on:
- Should
on-phasephase names stay freeform, or should the spec define a base vocabulary for common patterns? - Should collected values be exposed to the agent by structured reference (
project-setup.track-type) or injected as natural language? - Are there trigger types or field types missing from the initial set that you'd reach for immediately?
The Agent Skills ecosystem is growing fast — 16k stars, implementations across Claude Code, Cursor, Copilot, Windsurf, and more. Getting human-in-the-loop right at the spec level matters. Come tell us what we're missing.
Ramakrishnan Meenakshi Sundaram is a VP Engineering at ANZ Bank and a contributor to the Agent Skills ecosystem. His skill collection is at github.com/ramki982/ram-agent-skills.
Top comments (0)