Over the past six months I shipped about 10 Agent Skills — SSO login for an internal game platform, a design-to-code pipeline, a marketing-campaign manager, and a handful of others.
One Skill on its own is simple: a SKILL.md that tells the Agent the rules, plus a script that does the actual work. But once you have five or ten of them, the hard part stops being "can I write this script?" It becomes: can I stop re-solving the same boring engineering problems on every single Skill?
Directory layout, HTTP boilerplate, SKILL.md validation, syncing the build to where the Agent can run it — by the third or fourth Skill, these chores eat real time. So I packaged the answers into one small tool: skill-kits.
Here's what changed:
| Task | Before: by hand | After: skill-kits |
|---|---|---|
| Create a new Skill | Redo every project decision |
pnpm new <name> — one line |
| HTTP / error handling / output | Copy-paste into each Skill |
import from the runtime, inlined |
| Sync changes to the Agent |
cp -r by hand |
pnpm dev — watch + auto-sync |
SKILL.md quality |
Eyeball it in review |
pnpm lint, run automatically on build |
It's not a framework and it won't write your business logic. Think of it as the build toolkit for Agent Skills — roughly what Vite is to a frontend app, plus a small standard library. You write TypeScript; it hands the Agent a single zero-dependency .mjs file.
The 30-second tour
npx skill-kits init my-skills # scaffold a pnpm monorepo
cd my-skills && pnpm install
pnpm new daily-report # add a Skill
pnpm dev daily-report --out ~/.agent/skills # watch + sync to the Agent
pnpm build daily-report # lint + bundle + zip
pnpm test daily-report # run unit tests
That's the whole loop. The rest of this post is why each piece exists.
Write TypeScript, ship zero-dependency JS
The Agent only ever runs one thing:
node scripts/main.mjs
But authoring in plain JS means no types, no autocomplete, no safety net. I tried the obvious workarounds and both hurt:
-
Run TS with
bun— great locally, but the Agent's environment may not have bun, so you're sniffing for runtimes. -
Fall back to
npx tsc— works, but nowSKILL.mdhas to declare how to execute itself, andnpxadds cold-start latency on every call.
So the build step does the boring-but-correct thing: esbuild bundles your src/main.ts (plus any shared code) into a single ESM file with zero runtime dependencies. The Agent just needs Node.
source (TypeScript) output (zero-dep ESM)
┌──────────────────┐ ┌──────────────────────────┐
│ src/main.ts │ │ dist/<skill-name>/ │
│ src/commands/ │ build │ ├── SKILL.md │
│ references/ │ ───────► │ ├── scripts/main.mjs │
│ assets/ │ esbuild │ ├── references/ │
│ SKILL.md │ │ └── assets/ │
└──────────────────┘ └──────────────────────────┘
Agent runs: node scripts/main.mjs
A runtime, so you stop copying boilerplate
A Skill is really two layers: a structured prompt (SKILL.md) that defines the rules, and a tool (the script) that executes. The prompt is always bespoke — but the script layer repeats constantly: command routing, arg parsing, the stdout/stderr protocol, HTTP, error codes, long-poll heartbeats.
I collapsed those into skill-kits/runtime. Everything below is inlined into your bundle at build time.
Command routing instead of a switch pile
My campaign-manager Skill had 7 subcommands. The hand-written main.ts was ~250 lines of parseArgs + switch + usage + validation, with the logic for a single command smeared across four places. Adding one command meant editing all four — and forgetting one.
The router makes it declarative:
import { createRouter, writeResult } from "skill-kits/runtime";
const router = createRouter({
name: "daily-report",
description: "...",
commonArgs: {
// injected into every subcommand, fully typed
domain: { type: "string", required: true, desc: "platform domain" },
token: { type: "string", required: true, desc: "SSO token" },
},
});
router.command({
name: "fetch",
description: "fetch yesterday's data",
args: {
date: { type: "string", required: true, desc: "YYYY-MM-DD" },
env: { type: "string", choices: ["boe", "online"] as const, desc: "env" },
filter: { type: "json", desc: "complex filter, parsed from JSON" },
},
async handler({ date, env, filter, domain, token }) {
// env is typed "boe" | "online"; filter is already JSON.parse-d
writeResult({ ok: true, items: [] });
},
});
router.run(process.argv.slice(2));
Args support string / number / boolean / list / json. A missing required arg throws automatically; choices both validates and narrows the type. --help is generated for free. You stop thinking about parsing and only think about what this command needs and what it does.
One output protocol the Agent can trust
When I started, I used console.log for everything. The reliable pattern turned out to be stricter: structured JSON on stdout, progress text on stderr, and a non-zero exit code on failure — far more dependable than asking the LLM to parse an error message out of free text.
import { writeResult, writeError, notify } from "skill-kits/runtime";
writeResult({ ok: true, data }); // stdout: single-line JSON for the Agent
notify("fetching data..."); // stderr: progress, won't pollute stdout
writeError(err); // stderr: structured error + exitCode = 1
HTTP: a thin wrapper, not a mega-client
I almost built a full-featured HttpClient with auth, retries, baseURL, the works. Then I noticed every Skill handled HTTP differently — Cookie auth here, Bearer there, incompatible error codes everywhere. A big abstraction would've been wrong. So the runtime only removes fetch boilerplate and never throws — network errors and non-2xx both surface via res.ok:
import { httpGet, HttpError } from "skill-kits/runtime";
const res = await httpGet<UserInfo>("https://api.example.com/me", {
headers: { authorization: `Bearer ${token}` },
query: { fields: "id,name" },
timeoutMs: 10_000,
});
if (!res.ok) throw new HttpError(res.status, url, res.statusText);
Error codes the LLM can branch on
Built-in error classes map to stable codes, so both you and the Agent can react by code instead of guessing from a message:
| Class | code | When |
|---|---|---|
UserInputError |
USER_INPUT_ERROR |
Missing / malformed argument |
AuthError |
AUTH_ERROR |
Token expired / no permission |
HttpError |
HTTP_ERROR |
Upstream HTTP non-2xx |
BusinessApiError |
BIZ_<code> |
HTTP 200 but business code ≠ 0 |
throw new UserInputError("activityId is required", { field: "activityId" });
// stderr → {"ok":false,"code":"USER_INPUT_ERROR","error":"activityId is required",...}
A heartbeat so the Agent doesn't think you died
For long callbacks (code generation, SSO redirects), the real risk isn't timing out — it's looking timed out. Sleep 60s with no output and the Agent may kill the process. sleepWithHeartbeat writes to stderr every few seconds so it knows you're alive:
import { sleepWithHeartbeat } from "skill-kits/runtime";
await sleepWithHeartbeat(60_000, {
message: (rem) => `waiting for code generation... ${rem}s left`,
});
Lint your SKILL.md
A SKILL.md can't be standardized like code — but plenty of failure modes can be caught locally before they waste an Agent run: name not matching the directory, a body so long it blows the context window, broken relative references, a description too vague to ever trigger.
pnpm build runs these by default:
| Rule | Level | Catches |
|---|---|---|
name-matches-dir |
error |
name must equal the parent directory |
body-line-limit |
error | body > 500 lines |
ref-relative |
error | references must be relative paths |
description-length |
warn | description too short → under-triggers |
description-trigger |
info | missing "when to use" hints |
The last two matter more than they look. A "correct" SKILL.md isn't the same as a useful one — if the description doesn't tell the LLM when to reach for the Skill, it simply won't.
Dev mode: stop running cp -r
To test a Skill you have to get it into the Agent's local skills directory. That used to mean cp -r dist/xxx ~/.agent/skills after every change. dev mode does it for you:
pnpm dev daily-report --out ~/.agent/skills
It watches src/ and rebuilds on .ts changes via esbuild, and separately watches SKILL.md / references/ / assets/, syncing them straight to --out. Edit locally; the Agent picks up the latest on its next call.
Test it like a real project
Skills run unattended inside an Agent, so a broken command is expensive — you often don't find out until a run fails. That's a good reason to unit-test them. Tests follow the src/**/*.test.ts convention and run via pnpm test (built on node:test + tsx, zero config). The usual goal is to assert a command's exit behavior — the JSON it writes to stdout and its exit code.
skill-kits/testing ships two helpers. captureOutput grabs what writeResult / writeError / notify wrote plus the exit code; mockFetch swaps out the global fetch so no real network is hit:
import { test } from "node:test";
import assert from "node:assert/strict";
import { mockFetch, captureOutput } from "skill-kits/testing";
import { createActivity } from "./create-activity.js";
const ctx = { domain: "https://example.com", token: "t" }; // resolved commonArgs
// success path: fake the HTTP, assert the stdout JSON + exit code
test("create returns ok with backend data", async () => {
const mock = mockFetch([
{ match: /\/activity\/create/, json: { code: 0, data: { activity_id: 9001 } } },
]);
try {
const { json, exitCode } = await captureOutput(() =>
createActivity(ctx, { act_name: "test" }),
);
assert.equal(exitCode, 0);
assert.equal((json as { activity_id: number }).activity_id, 9001);
} finally {
mock.restore();
}
});
Pure functions need neither helper — just import and assert. For the error path, commands throw a SkillError (the router maps it to exit 1 + stderr JSON), so reach for assert.rejects. An unmatched mockFetch request throws on purpose, so a missing mock never passes silently.
The full loop, from new Skill to shippable artifact:
pnpm new daily-report
# ... write code + tests ...
pnpm test daily-report # run unit tests
pnpm build daily-report # lint → bundle → zip
Try it
skill-kits doesn't write your scripts or invent your abstractions. It draws one guardrail around the cross-cutting chores — entry points, build output, runtime, validation, dev sync, tests — so that when you create your 5th or 10th Skill, your head is still on the business logic.
npx skill-kits init my-skills
GitHub: https://github.com/weijhfly/skill-kits
NPM: https://www.npmjs.com/package/skill-kits
If you're building Agent Skills too, I'd love to hear how you're handling the same problems — what does your Skill workflow look like right now?
Top comments (0)