Over the past six months, I've built about 10 Agent Skills — the kind of scripts that LLM-powered coding agents invoke to get real work done. By the third one, I noticed something embarrassing: I'd copy-pasted three different HTTP wrappers across three different skills.
Cookie auth in one. Bearer token in another. Network error handling in a third. Three copies of http.ts, each slightly different. If the auth method ever changed, I'd have to fix the same logic in three places.
And HTTP was just the start. Command routing, argument parsing, error handling, SKILL.md validation, build output syncing — by Skill #5, these non-business concerns were eating up real time.
So I built skill-kits — a toolchain that turns those recurring engineering problems into a reusable foundation. Here's what it actually solves:
| Problem | Before (manual) | After (skill-kits) |
|---|---|---|
| Creating a new Skill | Reinvent the directory structure every time |
pnpm new <name> — one command |
| HTTP / error handling utils | Copy-paste across skills | Import from runtime, zero-dependency inlined |
| Syncing changes to the agent |
cp -r / manual copy |
pnpm dev — watch + auto-sync |
SKILL.md quality |
Eyeball review |
pnpm build — automated lint checks |
1. Three copies of http.ts across three skills
Here's what "just copy-paste the HTTP wrapper" actually looks like after a few skills:
// skill1/scripts/http.ts — Cookie auth + error detail extraction
async function request<T>(
method: "GET" | "POST",
domain: string,
path: string,
token: string,
options?: { params?: Record<string, string>; body?: unknown },
) {
const url = new URL(`${domain}/gms_api${path}`);
if (options?.params) {
Object.entries(options.params).forEach(([k, v]) => {
url.searchParams.set(k, v);
});
}
const res = await fetch(url.toString(), {
method,
headers: {
"Content-Type": "application/json",
Cookie: `x-token=${token}`,
},
body: options?.body ? JSON.stringify(options.body) : undefined,
});
if (!res.ok) {
const detail = await res.text().catch(() => "");
throw new Error(`HTTP ${res.status}: ${res.statusText}\n${detail}`);
}
return res.json() as Promise<ApiResponse<T>>;
}
// skill2/scripts/http.ts — Bearer Token + network fallback + response parsing tolerance
async function postJson<T>(url: string, body: unknown, token?: string) {
let response: Response;
try {
response = await fetch(url, {
method: "POST",
headers: {
"content-type": "application/json",
...(token ? { authorization: `Bearer ${token}` } : {}),
},
body: JSON.stringify(body),
});
} catch (error) {
return {
httpOk: false,
status: 0,
statusText: `NetworkError: ${error instanceof Error ? error.message : String(error)}`,
data: null,
};
}
const raw = await response.text();
let data: unknown = null;
try {
data = JSON.parse(raw);
} catch {
data = { raw };
}
return { httpOk: response.ok, status: response.status, data: data as T };
}
// skill3/scripts/http.ts — yet another variant (omitted)
Three skills, three http.ts files. If auth ever switches from Cookie to Bearer, that's three places to update. Not great.
So I added a thin fetch wrapper to skill-kits' runtime. Not a full-blown HTTP client — just enough to stop writing the same boilerplate:
import { httpGet, HttpError } from "skill-kits/runtime";
const res = await httpGet<UserInfo>("https://api.example.com/me", {
headers: { authorization: `Bearer ${token}` },
query: { fields: "id,name" },
timeoutMs: 10_000,
});
if (!res.ok) {
throw new HttpError(res.status, url, res.statusText);
}
The business-specific wrapping stays in each skill. The boring parts are handled once.
Beyond HTTP, error handling was another mess. Every skill threw new Error("something") and debugging meant guessing from raw messages. I consolidated common errors into a unified code system:
| Class | Code | Typical scenario |
|---|---|---|
UserInputError |
USER_INPUT_ERROR |
Missing / malformed arguments |
AuthError |
AUTH_ERROR |
Expired token / insufficient permissions |
HttpError |
HTTP_ERROR |
Upstream non-2xx response |
BusinessApiError |
BIZ_<code> |
HTTP 200 but business code ≠ 0 |
A few real examples:
import { SkillError, UserInputError, BusinessApiError } from "skill-kits/runtime";
// UserInputError: argument validation failure
throw new UserInputError("activityId is required", { field: "activityId" });
// stderr → {"ok":false,"code":"USER_INPUT_ERROR","error":"activityId is required","details":{"field":"activityId"}}
// Custom BusinessApiError
throw new BusinessApiError(-10000, "token expired", {
hintMap: { [-10000]: "Please re-login", [-14]: "Record not found" },
});
// stderr → {"ok":false,"code":"BIZ_-10000","error":"[code=-10000] token expired (Please re-login)"}
// Custom error: extend SkillError with any code
class RateLimitError extends SkillError {
constructor(retryAfterSec?: number) {
super("RATE_LIMIT", "Too many requests", { retryAfterSec });
}
}
throw new RateLimitError(30);
// stderr → {"ok":false,"code":"RATE_LIMIT","error":"Too many requests","details":{"retryAfterSec":30}}
Now both humans and LLMs can branch on error codes instead of parsing free-text messages.
2. Seven subcommands, 250 lines of parseArgs + switch
When I built a marketing campaign management Skill with 7 subcommands, main.ts turned into a monster of parseArgs + switch + usage + validation:
// ❌ Manual: parseArgs + switch, ~250 lines
function parseArgs(): CliOptions {
const args = process.argv.slice(2);
const opts: Partial<CliOptions> = {};
for (let i = 1; i < args.length; i++) {
switch (args[i]) {
case "--domain":
opts.domain = args[++i];
break;
case "--app-id":
opts.appId = args[++i];
break;
case "--token":
opts.token = args[++i];
break;
case "--activity-id":
opts.activityId = args[++i];
break;
case "--body":
opts.body = args[++i];
break;
// ... a dozen more cases
}
}
if (!opts.domain) {
console.error("❌ Missing --domain");
process.exit(1);
}
if (!opts.appId) {
console.error("❌ Missing --app-id");
process.exit(1);
}
if (!opts.token) {
console.error("❌ Missing --token");
process.exit(1);
}
return opts as CliOptions;
}
async function main() {
const opts = parseArgs();
switch (opts.command) {
case "get_activity":
if (!opts.activityId) {
console.error("❌ Requires --activity-id");
process.exit(1);
}
await getActivity(opts.domain, opts.appId, opts.token, opts.activityId);
break;
case "create_activity":
if (!opts.body) {
console.error("❌ Requires --body");
process.exit(1);
}
await createActivity(opts.domain, opts.appId, opts.token, opts.body);
break;
// ... remaining cases
}
}
The real problem isn't the line count — it's that parsing, validation, usage text, and error handling are scattered across different places. Adding a command means touching several spots, and it's easy to miss one.
I replaced all of that with createRouter:
import { createRouter, writeResult } from "skill-kits/runtime";
// ✅ Declarative: args, validation, routing — all in one place
const router = createRouter({
name: "redbrick-activity",
description: "...",
commonArgs: {
domain: { type: "string", required: true, desc: "Platform domain" },
appId: { type: "string", required: true, desc: "App ID" },
token: { type: "string", required: true, desc: "SSO token" },
},
});
router.command({
name: "get-activity",
description: "Query activity details",
args: {
activityId: { type: "string", required: true, desc: "Activity ID" },
},
async handler({ domain, appId, token, activityId }) {
writeResult(await getActivity(domain, appId, token, activityId));
},
});
router.command({
name: "create-activity",
description: "Create an activity",
args: {
body: { type: "json", required: true, desc: "Activity fields as JSON" },
},
async handler({ domain, appId, token, body }) {
writeResult(await createActivity(domain, appId, token, body));
},
});
router.run(process.argv.slice(2));
After this abstraction, I stopped thinking about "how arguments get parsed" and started thinking about "what arguments does this command need, and what do I do with them."
3. stdout and stderr mixed together — the agent can't tell them apart
I used to dump everything with console.log and reserve console.error only for actual errors. The problem? The LLM had to guess which stdout line was the JSON result and which was a progress log. One wrong guess, one wasted invocation.
The better approach: stdout for JSON results, stderr for progress messages. On failure, a non-zero exit code tells the agent status: failed — far more reliable than making the LLM parse error text from stderr.
skill-kits provides three simple output helpers:
writeResult(payload); // stdout: single-line JSON for the agent
writeError(errorOrMessage, { code?, extra? }); // stderr: structured error + exitCode=1
notify("Fetching data..."); // stderr: progress log
4. The agent thought the process was dead — it was just waiting 60 seconds
For D2C code generation, SSO login callbacks, and similar long-running tasks, the real problem isn't "actually timed out" — it's "looks like it timed out."
A naive 60-second sleep:
await new Promise((resolve) => setTimeout(resolve, 60_000));
Zero output for 60 seconds. The agent assumes the process hung and kills it.
Enter sleepWithHeartbeat:
import { sleepWithHeartbeat } from "skill-kits/runtime";
await sleepWithHeartbeat(60_000, {
message: (rem) => `Waiting for code generation... ${rem}s remaining`,
});
It prints a heartbeat to stderr every 5 seconds, so the agent knows the process is still alive.
5. A "correct" SKILL.md isn't necessarily a good one
I learned this one the hard way. Two problems kept coming up:
-
Wrong triggers: the
descriptionwas too vague, so the LLM didn't know when to invoke the skill. I had to keep tweaking keywords. -
Too complete to edit: AI-generated
SKILL.mdfiles were thorough, but when I wanted to make changes myself, I didn't know where to start.
SKILL.md can't be fully standardized like code, but some things should absolutely be caught locally:
-
namedoesn't match the directory name - Body is too long, blowing up the context window
-
referencespoints to a wrong path -
descriptionis too short to convey trigger scenarios
skill-kits ships a set of lint rules that run automatically on pnpm build:
-
name-matches-dir:namemust equal the parent directory name -
body-line-limit: body over 500 lines → hard error -
body-line-soft: over 400 lines → warning, suggest splitting intoreferences/ -
description-length: description too short → warning -
description-trigger/description-negative: checks for "when to trigger" and "when NOT to trigger" hints
You can customize thresholds via .skillkitrc.json:
{
"lint": {
"triggerHints": ["when", "trigger", "use when"],
"negativeHints": ["don't", "do not", "never"],
"descriptionMinChars": 80,
"bodyLinesWarn": 400,
"bodyLinesFail": 500
}
}
There's also a second layer of reuse beyond the runtime: shared business modules. Internal API clients, domain constants, signing utilities — the kind of helpers that pop up across multiple skills. skill-kits init generates a workspace with packages/shared:
import { signRequest, BIZ_DOMAINS } from "@skills/shared";
At build time, esbuild inlines only the used parts. The output is still a single file, zero dependencies.
6. Change one line → build → cp -r → try again
When developing a Skill, you need to sync it to the agent's local skills directory to test. I used to do this manually:
cp -r dist/xxx ~/.agent/skills
After the tenth time, I added a dev mode:
pnpm dev daily-report --out ~/.agent/skills
It does two things in parallel:
- esbuild watches
src/—.tschanges trigger rebuild - Watches
SKILL.md / references/ / assets/— resource changes sync directly to--out
Now I edit locally, and the agent picks up the latest version on its next invocation.
7. You don't know a Skill is broken until it fails in production
Skills run unattended inside agents. A broken command costs more than a broken CLI — you often don't find out until a run fails silently. Writing tests is worth it.
skill-kits keeps testing simple: test files go in src/**/*.test.ts, run with pnpm test, powered by node:test + tsx. Zero configuration.
The main thing you're testing: the exit behavior of a command — what JSON it writes to stdout, whether the exit code is 0 or 1, what error appears on stderr on failure.
Two helpers from skill-kits/testing:
-
captureOutput(fn): captureswriteResult/writeError/notifyoutput, plusprocess.exitCode -
mockFetch(routes): replaces globalfetch, no real network calls
A typical happy-path test:
import { test } from "node:test";
import assert from "node:assert/strict";
import { mockFetch, captureOutput } from "skill-kits/testing";
import { createActivity } from "./commands/create-activity.js";
const ctx = { domain: "https://example.com", token: "t" };
test("create-activity returns backend data with ok", async () => {
const mock = mockFetch([
{ match: /\/activity\/create/, json: { code: 0, data: { activity_id: 9001 } } },
]);
try {
const { json, exitCode } = await captureOutput(() =>
createActivity(ctx, { act_name: "test" }),
);
assert.equal(exitCode, 0);
assert.equal((json as { activity_id: number }).activity_id, 9001);
} finally {
mock.restore();
}
});
For pure functions, you don't need either helper — just import and assert. For error paths, commands throw a SkillError (the router maps it to exit code 1 + stderr JSON), and you catch it with assert.rejects.
One detail worth mentioning: mockFetch intentionally throws on unmatched requests. A missing mock will never silently pass — no "tests pass but production fails" surprises.
The daily loop is three commands:
pnpm new daily-report
# ... write code + tests ...
pnpm test daily-report # run tests
pnpm build daily-report # lint → bundle → zip
Wrapping up
skill-kits doesn't write your scripts or abstract your business logic. What it does is simple: it wraps entry points, build output, runtime utilities, and definition validation into a guardrail layer — so when you're building your 5th or 10th Skill, your brain is still on business logic, not boilerplate.
If you're building Agent Skills too, give it a try:
npx skill-kits init my-skills
Top comments (0)