DEV Community

Cover image for I hand-wrote 10 Agent Skills. Then I built the toolkit I wish I'd had.
weijhfly
weijhfly

Posted on

I hand-wrote 10 Agent Skills. Then I built the toolkit I wish I'd had.

Over the past six months I shipped about 10 Agent Skills — SSO login for an internal game platform, a design-to-code pipeline, a marketing-campaign manager, and a handful of others.

One Skill on its own is simple: a SKILL.md that tells the Agent the rules, plus a script that does the actual work. But once you have five or ten of them, the hard part stops being "can I write this script?" It becomes: can I stop re-solving the same boring engineering problems on every single Skill?

Directory layout, HTTP boilerplate, SKILL.md validation, syncing the build to where the Agent can run it — by the third or fourth Skill, these chores eat real time. So I packaged the answers into one small tool: skill-kits.

Here's what changed:

Task Before: by hand After: skill-kits
Create a new Skill Redo every project decision pnpm new <name> — one line
HTTP / error handling / output Copy-paste into each Skill import from the runtime, inlined
Sync changes to the Agent cp -r by hand pnpm dev — watch + auto-sync
SKILL.md quality Eyeball it in review pnpm lint, run automatically on build

It's not a framework and it won't write your business logic. Think of it as the build toolkit for Agent Skills — roughly what Vite is to a frontend app, plus a small standard library. You write TypeScript; it hands the Agent a single zero-dependency .mjs file.

The 30-second tour

npx skill-kits init my-skills    # scaffold a pnpm monorepo
cd my-skills && pnpm install

pnpm new daily-report                          # add a Skill
pnpm dev daily-report --out ~/.agent/skills    # watch + sync to the Agent
pnpm build daily-report                         # lint + bundle + zip
pnpm test daily-report                          # run unit tests
Enter fullscreen mode Exit fullscreen mode

That's the whole loop. The rest of this post is why each piece exists.

Write TypeScript, ship zero-dependency JS

The Agent only ever runs one thing:

node scripts/main.mjs
Enter fullscreen mode Exit fullscreen mode

But authoring in plain JS means no types, no autocomplete, no safety net. I tried the obvious workarounds and both hurt:

  • Run TS with bun — great locally, but the Agent's environment may not have bun, so you're sniffing for runtimes.
  • Fall back to npx tsc — works, but now SKILL.md has to declare how to execute itself, and npx adds cold-start latency on every call.

So the build step does the boring-but-correct thing: esbuild bundles your src/main.ts (plus any shared code) into a single ESM file with zero runtime dependencies. The Agent just needs Node.

   source (TypeScript)              output (zero-dep ESM)
┌──────────────────┐           ┌──────────────────────────┐
│  src/main.ts     │           │  dist/<skill-name>/      │
│  src/commands/   │  build    │  ├── SKILL.md            │
│  references/     │ ───────►  │  ├── scripts/main.mjs    │
│  assets/         │  esbuild  │  ├── references/         │
│  SKILL.md        │           │  └── assets/             │
└──────────────────┘           └──────────────────────────┘
                                  Agent runs: node scripts/main.mjs
Enter fullscreen mode Exit fullscreen mode

A runtime, so you stop copying boilerplate

A Skill is really two layers: a structured prompt (SKILL.md) that defines the rules, and a tool (the script) that executes. The prompt is always bespoke — but the script layer repeats constantly: command routing, arg parsing, the stdout/stderr protocol, HTTP, error codes, long-poll heartbeats.

I collapsed those into skill-kits/runtime. Everything below is inlined into your bundle at build time.

Command routing instead of a switch pile

My campaign-manager Skill had 7 subcommands. The hand-written main.ts was ~250 lines of parseArgs + switch + usage + validation, with the logic for a single command smeared across four places. Adding one command meant editing all four — and forgetting one.

The router makes it declarative:

import { createRouter, writeResult } from "skill-kits/runtime";

const router = createRouter({
  name: "daily-report",
  description: "...",
  commonArgs: {
    // injected into every subcommand, fully typed
    domain: { type: "string", required: true, desc: "platform domain" },
    token: { type: "string", required: true, desc: "SSO token" },
  },
});

router.command({
  name: "fetch",
  description: "fetch yesterday's data",
  args: {
    date: { type: "string", required: true, desc: "YYYY-MM-DD" },
    env: { type: "string", choices: ["boe", "online"] as const, desc: "env" },
    filter: { type: "json", desc: "complex filter, parsed from JSON" },
  },
  async handler({ date, env, filter, domain, token }) {
    // env is typed "boe" | "online"; filter is already JSON.parse-d
    writeResult({ ok: true, items: [] });
  },
});

router.run(process.argv.slice(2));
Enter fullscreen mode Exit fullscreen mode

Args support string / number / boolean / list / json. A missing required arg throws automatically; choices both validates and narrows the type. --help is generated for free. You stop thinking about parsing and only think about what this command needs and what it does.

One output protocol the Agent can trust

When I started, I used console.log for everything. The reliable pattern turned out to be stricter: structured JSON on stdout, progress text on stderr, and a non-zero exit code on failure — far more dependable than asking the LLM to parse an error message out of free text.

import { writeResult, writeError, notify } from "skill-kits/runtime";

writeResult({ ok: true, data });   // stdout: single-line JSON for the Agent
notify("fetching data...");        // stderr: progress, won't pollute stdout
writeError(err);                   // stderr: structured error + exitCode = 1
Enter fullscreen mode Exit fullscreen mode

HTTP: a thin wrapper, not a mega-client

I almost built a full-featured HttpClient with auth, retries, baseURL, the works. Then I noticed every Skill handled HTTP differently — Cookie auth here, Bearer there, incompatible error codes everywhere. A big abstraction would've been wrong. So the runtime only removes fetch boilerplate and never throws — network errors and non-2xx both surface via res.ok:

import { httpGet, HttpError } from "skill-kits/runtime";

const res = await httpGet<UserInfo>("https://api.example.com/me", {
  headers: { authorization: `Bearer ${token}` },
  query: { fields: "id,name" },
  timeoutMs: 10_000,
});
if (!res.ok) throw new HttpError(res.status, url, res.statusText);
Enter fullscreen mode Exit fullscreen mode

Error codes the LLM can branch on

Built-in error classes map to stable codes, so both you and the Agent can react by code instead of guessing from a message:

Class code When
UserInputError USER_INPUT_ERROR Missing / malformed argument
AuthError AUTH_ERROR Token expired / no permission
HttpError HTTP_ERROR Upstream HTTP non-2xx
BusinessApiError BIZ_<code> HTTP 200 but business code ≠ 0
throw new UserInputError("activityId is required", { field: "activityId" });
// stderr → {"ok":false,"code":"USER_INPUT_ERROR","error":"activityId is required",...}
Enter fullscreen mode Exit fullscreen mode

A heartbeat so the Agent doesn't think you died

For long callbacks (code generation, SSO redirects), the real risk isn't timing out — it's looking timed out. Sleep 60s with no output and the Agent may kill the process. sleepWithHeartbeat writes to stderr every few seconds so it knows you're alive:

import { sleepWithHeartbeat } from "skill-kits/runtime";

await sleepWithHeartbeat(60_000, {
  message: (rem) => `waiting for code generation... ${rem}s left`,
});
Enter fullscreen mode Exit fullscreen mode

Lint your SKILL.md

A SKILL.md can't be standardized like code — but plenty of failure modes can be caught locally before they waste an Agent run: name not matching the directory, a body so long it blows the context window, broken relative references, a description too vague to ever trigger.

pnpm build runs these by default:

Rule Level Catches
name-matches-dir error name must equal the parent directory
body-line-limit error body > 500 lines
ref-relative error references must be relative paths
description-length warn description too short → under-triggers
description-trigger info missing "when to use" hints

The last two matter more than they look. A "correct" SKILL.md isn't the same as a useful one — if the description doesn't tell the LLM when to reach for the Skill, it simply won't.

Dev mode: stop running cp -r

To test a Skill you have to get it into the Agent's local skills directory. That used to mean cp -r dist/xxx ~/.agent/skills after every change. dev mode does it for you:

pnpm dev daily-report --out ~/.agent/skills
Enter fullscreen mode Exit fullscreen mode

It watches src/ and rebuilds on .ts changes via esbuild, and separately watches SKILL.md / references/ / assets/, syncing them straight to --out. Edit locally; the Agent picks up the latest on its next call.

Test it like a real project

Skills run unattended inside an Agent, so a broken command is expensive — you often don't find out until a run fails. That's a good reason to unit-test them. Tests follow the src/**/*.test.ts convention and run via pnpm test (built on node:test + tsx, zero config). The usual goal is to assert a command's exit behavior — the JSON it writes to stdout and its exit code.

skill-kits/testing ships two helpers. captureOutput grabs what writeResult / writeError / notify wrote plus the exit code; mockFetch swaps out the global fetch so no real network is hit:

import { test } from "node:test";
import assert from "node:assert/strict";
import { mockFetch, captureOutput } from "skill-kits/testing";
import { createActivity } from "./create-activity.js";

const ctx = { domain: "https://example.com", token: "t" }; // resolved commonArgs

// success path: fake the HTTP, assert the stdout JSON + exit code
test("create returns ok with backend data", async () => {
  const mock = mockFetch([
    { match: /\/activity\/create/, json: { code: 0, data: { activity_id: 9001 } } },
  ]);
  try {
    const { json, exitCode } = await captureOutput(() =>
      createActivity(ctx, { act_name: "test" }),
    );
    assert.equal(exitCode, 0);
    assert.equal((json as { activity_id: number }).activity_id, 9001);
  } finally {
    mock.restore();
  }
});
Enter fullscreen mode Exit fullscreen mode

Pure functions need neither helper — just import and assert. For the error path, commands throw a SkillError (the router maps it to exit 1 + stderr JSON), so reach for assert.rejects. An unmatched mockFetch request throws on purpose, so a missing mock never passes silently.

The full loop, from new Skill to shippable artifact:

pnpm new daily-report
# ... write code + tests ...
pnpm test daily-report     # run unit tests
pnpm build daily-report    # lint → bundle → zip
Enter fullscreen mode Exit fullscreen mode

Try it

skill-kits doesn't write your scripts or invent your abstractions. It draws one guardrail around the cross-cutting chores — entry points, build output, runtime, validation, dev sync, tests — so that when you create your 5th or 10th Skill, your head is still on the business logic.

npx skill-kits init my-skills
Enter fullscreen mode Exit fullscreen mode

GitHub: https://github.com/weijhfly/skill-kits
NPM: https://www.npmjs.com/package/skill-kits

If you're building Agent Skills too, I'd love to hear how you're handling the same problems — what does your Skill workflow look like right now?

Top comments (0)