Most AI agent advice still sounds like prompt advice.
Add more context.
Write clearer instructions.
Give the model examples.
Use a better system prompt.
That helps, but it misses the part that breaks in real work.
The problem is not always that the agent does not know a command. The problem is that the agent does not know your workflow.
It does not know when to inspect first.
It does not know which defaults are safe.
It does not know what "done" means.
It does not know when to stop instead of guessing.
It does not know which checks matter before something leaves the local machine.
That is where Terminal Skills become useful.
A Terminal Skill is not just a command shortcut. It is a small reusable operating procedure for an agent.
It teaches:
- when to use a workflow
- what inputs are acceptable
- which commands or scripts are preferred
- what output should exist
- how to verify the result
- when to stop and ask for help
That last part is the difference between a useful agent workflow and a confident mess.
Here is the pattern I use when writing one.
Start with the task, not the tool
The easiest mistake is to begin with a tool name.
Bad starting point:
Make an FFmpeg skill.
Better starting point:
Make a skill that turns a raw video file into an X-ready MP4, then verifies the upload is likely to work.
Those are different scopes.
The first one is a tool wrapper.
The second one is a workflow.
Agents do not need every possible FFmpeg flag. They need a stable path through a common problem.
The same applies to any terminal workflow:
- not "make a git skill"
- but "review a dirty worktree without touching unrelated user changes"
- not "make a deploy skill"
- but "deploy a Next.js app to Vercel and verify the live URL"
- not "make a search skill"
- but "inspect a repo and find the smallest safe file set for this task"
The more specific the workflow, the more useful the skill.
A useful skill has a contract
I like thinking about a Terminal Skill as a contract between the human, the agent, and the machine.
The contract says:
If this kind of task appears,
and these inputs exist,
follow this workflow,
produce this output,
run these checks,
and stop under these conditions.
That sounds simple, but it removes a lot of randomness.
Without a contract, the agent improvises.
With a contract, the agent has a default operating path.
That does not make the agent less intelligent. It makes the work less dependent on fresh reasoning every time.
The basic structure
For most Terminal Skills, I would start with a folder like this:
my-skill/
SKILL.md
scripts/
run.sh
examples/
input-example.txt
README.md
Not every skill needs a script.
Some skills are mostly procedural. Some are wrappers around existing CLIs. Some are just strong instructions plus verification commands.
But the SKILL.md is the important part.
It should tell the agent how to work, not just what the tool does.
Here is a practical template.
# Skill Name
## Use When
Use this skill when the user asks to:
- ...
- ...
## Do Not Use When
Do not use this skill when:
- ...
- ...
## Inputs
Expected inputs:
- source file or directory
- target format
- optional config
## Workflow
1. Inspect the input.
2. Choose the smallest safe action.
3. Run the script or command.
4. Verify the output.
5. Report the result with file paths and any warnings.
## Commands
```bash
./scripts/run.sh input-file
```
## Verification
Check:
- output file exists
- output format is correct
- command exited successfully
- logs contain no obvious errors
## Stop Conditions
Stop and ask the user if:
- required input is missing
- output validation fails
- the task would publish, delete, charge, email, or deploy something
- the command could overwrite user data
That is already more useful than a loose prompt.
It gives the agent a map.
The most important section is "Stop Conditions"
Most people underwrite this part.
They document the happy path and skip the failure boundaries.
But agents need stop conditions badly.
A good skill should say things like:
- stop if the repo has unrelated user changes
- stop if the API token is missing
- stop if the public page cannot be verified
- stop if the output file has no video stream
- stop if the command would delete or overwrite source files
- stop if the user approved a draft but did not approve publishing
This is where agent workflows become safer.
For example, a social publishing skill should not only say:
Open the composer and publish the post.
It should say:
Before posting, verify the composer contains the exact approved text.
After posting, verify the final permalink shows the full text and attached media.
If media is missing, do not report success.
That is a real operating rule.
It captures the part of the workflow that normally lives in someone's head.
Verification should be concrete
"Check that it worked" is not enough.
A good Terminal Skill names the actual check.
For a video skill:
ffprobe -v error -show_streams -show_format output.mp4
For a code skill:
npm test
git diff --check
For a content publishing skill:
Open the final live URL.
Confirm the title, body, tags, canonical URL, and media are visible.
Do not trust the editor preview as final verification.
For a data export skill:
Check row count, headers, encoding, and sample records before sending the file.
Agents are very good at saying "done" too early.
Verification commands make "done" harder to fake.
Keep the skill narrow
The best skills are boringly specific.
Bad:
content-automation
Better:
devto-draft-from-markdown
x-safe-video-export
reddit-comment-visibility-check
vercel-preview-deploy-and-verify
Narrow skills have clearer triggers and fewer hidden assumptions.
They are also easier to improve.
If a video export fails, fix the video skill.
If a DEV.to draft misses a canonical URL, fix the DEV.to skill.
If a Reddit comment is visible to the owner but not public, fix the visibility check.
One giant "content automation" skill would hide all of those failures inside one vague blob.
Small skills make the workflow inspectable.
Use scripts for mechanics, instructions for judgment
I do not think every skill should become a huge script.
Scripts are good for mechanical repeatability:
- convert this file
- validate this JSON
- resize this image
- call this API
- generate this report
Instructions are better for judgment:
- when to use the script
- which candidates to reject
- how to handle approval
- what counts as verification
- when not to continue
The skill should combine both.
For example:
SKILL.md explains the workflow.
scripts/export.sh performs the conversion.
Verification commands prove the output.
Stop conditions prevent the agent from bluffing.
That is the useful shape.
Not "the agent has a tool."
"The agent has a way of working."
Example: a tiny repo-inspection skill
Here is a small example that does not need a script.
# Repo Inspection
## Use When
Use this before editing an unfamiliar codebase.
## Workflow
1. Print the current directory.
2. Check git status.
3. List top-level files.
4. Identify package/framework files.
5. Search for relevant code with ripgrep.
6. Read the smallest useful files before editing.
## Preferred Commands
```bash
pwd
git status --short
rg --files | head -80
rg -n "keyword|component|route" .
```
## Stop Conditions
Stop before editing if:
- the user has unrelated changes in the target file
- the task requires a destructive git command
- the repo structure is unclear after inspection
## Verification
Before reporting done:
- show changed files
- run the smallest relevant test or check
- explain any test that could not be run
This is not glamorous.
But it prevents a lot of common agent mistakes.
It teaches the agent the shape of careful work.
What makes a skill good?
I usually judge a skill by five questions.
1. Does it have a clear trigger?
The agent should know when to use it.
If the trigger is vague, the skill will either be ignored or overused.
2. Does it reduce repeat reasoning?
A good skill saves the agent from rediscovering the same workflow again.
If the workflow is only used once, it may not need a skill yet.
3. Does it define done?
The skill should say what output must exist and how to verify it.
If "done" is subjective, the agent will guess.
4. Does it include stop conditions?
This is the safety layer.
The skill should prevent confident continuation when the workflow is missing a required input, external approval, or verification.
5. Is it small enough to maintain?
If the skill becomes a giant manual for everything, it stops being useful.
Small, composable skills are easier to trust.
The bigger point
AI agents are getting better at tool use.
That does not mean every workflow should be improvised in chat.
The more capable the agent becomes, the more important operating procedures become.
Prompts are good for intent.
Tools are good for capability.
Skills are good for repeatable work.
That is the layer I think more developers should build.
Not because it is flashy.
Because boring, reusable workflows are what turn agents from demos into something you can actually depend on.
I am collecting more examples of this pattern at Terminal Skills.
If you are building your own agent workflows, start with one annoying task you repeat every week.
Write down the trigger, workflow, verification, and stop conditions.
That is your first skill.
Top comments (0)