weijhfly

Posted on Jun 19

I Built 10 AI Agent Skills and Shrank 300 Lines of Boilerplate to 30

#ai #agents #agentskills #typescript

Over the past six months, I've built about 10 Agent Skills — the kind of scripts that LLM-powered coding agents invoke to get real work done. By the third one, I noticed something embarrassing: I'd copy-pasted three different HTTP wrappers across three different skills.

Cookie auth in one. Bearer token in another. Network error handling in a third. Three copies of http.ts, each slightly different. If the auth method ever changed, I'd have to fix the same logic in three places.

And HTTP was just the start. Command routing, argument parsing, error handling, SKILL.md validation, build output syncing — by Skill #5, these non-business concerns were eating up real time.

So I built skill-kits — a toolchain that turns those recurring engineering problems into a reusable foundation. Here's what it actually solves:

Problem	Before (manual)	After (skill-kits)
Creating a new Skill	Reinvent the directory structure every time	`pnpm new <name>` — one command
HTTP / error handling utils	Copy-paste across skills	Import from runtime, zero-dependency inlined
Syncing changes to the agent	`cp -r` / manual copy	`pnpm dev` — watch + auto-sync
`SKILL.md` quality	Eyeball review	`pnpm build` — automated lint checks

1. Three copies of `http.ts` across three skills

Here's what "just copy-paste the HTTP wrapper" actually looks like after a few skills:

// skill1/scripts/http.ts — Cookie auth + error detail extraction
async function request<T>(
  method: "GET" | "POST",
  domain: string,
  path: string,
  token: string,
  options?: { params?: Record<string, string>; body?: unknown },
) {
  const url = new URL(`${domain}/gms_api${path}`);
  if (options?.params) {
    Object.entries(options.params).forEach(([k, v]) => {
      url.searchParams.set(k, v);
    });
  }

  const res = await fetch(url.toString(), {
    method,
    headers: {
      "Content-Type": "application/json",
      Cookie: `x-token=${token}`,
    },
    body: options?.body ? JSON.stringify(options.body) : undefined,
  });

  if (!res.ok) {
    const detail = await res.text().catch(() => "");
    throw new Error(`HTTP ${res.status}: ${res.statusText}\n${detail}`);
  }

  return res.json() as Promise<ApiResponse<T>>;
}

// skill2/scripts/http.ts — Bearer Token + network fallback + response parsing tolerance
async function postJson<T>(url: string, body: unknown, token?: string) {
  let response: Response;

  try {
    response = await fetch(url, {
      method: "POST",
      headers: {
        "content-type": "application/json",
        ...(token ? { authorization: `Bearer ${token}` } : {}),
      },
      body: JSON.stringify(body),
    });
  } catch (error) {
    return {
      httpOk: false,
      status: 0,
      statusText: `NetworkError: ${error instanceof Error ? error.message : String(error)}`,
      data: null,
    };
  }

  const raw = await response.text();
  let data: unknown = null;
  try {
    data = JSON.parse(raw);
  } catch {
    data = { raw };
  }

  return { httpOk: response.ok, status: response.status, data: data as T };
}

// skill3/scripts/http.ts — yet another variant (omitted)

Three skills, three http.ts files. If auth ever switches from Cookie to Bearer, that's three places to update. Not great.

So I added a thin fetch wrapper to skill-kits' runtime. Not a full-blown HTTP client — just enough to stop writing the same boilerplate:

import { httpGet, HttpError } from "skill-kits/runtime";

const res = await httpGet<UserInfo>("https://api.example.com/me", {
  headers: { authorization: `Bearer ${token}` },
  query: { fields: "id,name" },
  timeoutMs: 10_000,
});

if (!res.ok) {
  throw new HttpError(res.status, url, res.statusText);
}

The business-specific wrapping stays in each skill. The boring parts are handled once.

Beyond HTTP, error handling was another mess. Every skill threw new Error("something") and debugging meant guessing from raw messages. I consolidated common errors into a unified code system:

Class	Code	Typical scenario
`UserInputError`	`USER_INPUT_ERROR`	Missing / malformed arguments
`AuthError`	`AUTH_ERROR`	Expired token / insufficient permissions
`HttpError`	`HTTP_ERROR`	Upstream non-2xx response
`BusinessApiError`	`BIZ_<code>`	HTTP 200 but business code ≠ 0

A few real examples:

import { SkillError, UserInputError, BusinessApiError } from "skill-kits/runtime";

// UserInputError: argument validation failure
throw new UserInputError("activityId is required", { field: "activityId" });
// stderr → {"ok":false,"code":"USER_INPUT_ERROR","error":"activityId is required","details":{"field":"activityId"}}

// Custom BusinessApiError
throw new BusinessApiError(-10000, "token expired", {
  hintMap: { [-10000]: "Please re-login", [-14]: "Record not found" },
});
// stderr → {"ok":false,"code":"BIZ_-10000","error":"[code=-10000] token expired (Please re-login)"}

// Custom error: extend SkillError with any code
class RateLimitError extends SkillError {
  constructor(retryAfterSec?: number) {
    super("RATE_LIMIT", "Too many requests", { retryAfterSec });
  }
}
throw new RateLimitError(30);
// stderr → {"ok":false,"code":"RATE_LIMIT","error":"Too many requests","details":{"retryAfterSec":30}}

Now both humans and LLMs can branch on error codes instead of parsing free-text messages.

2. Seven subcommands, 250 lines of `parseArgs` + `switch`

When I built a marketing campaign management Skill with 7 subcommands, main.ts turned into a monster of parseArgs + switch + usage + validation:

// ❌ Manual: parseArgs + switch, ~250 lines
function parseArgs(): CliOptions {
  const args = process.argv.slice(2);
  const opts: Partial<CliOptions> = {};

  for (let i = 1; i < args.length; i++) {
    switch (args[i]) {
      case "--domain":
        opts.domain = args[++i];
        break;
      case "--app-id":
        opts.appId = args[++i];
        break;
      case "--token":
        opts.token = args[++i];
        break;
      case "--activity-id":
        opts.activityId = args[++i];
        break;
      case "--body":
        opts.body = args[++i];
        break;
      // ... a dozen more cases
    }
  }

  if (!opts.domain) {
    console.error("❌ Missing --domain");
    process.exit(1);
  }
  if (!opts.appId) {
    console.error("❌ Missing --app-id");
    process.exit(1);
  }
  if (!opts.token) {
    console.error("❌ Missing --token");
    process.exit(1);
  }

  return opts as CliOptions;
}

async function main() {
  const opts = parseArgs();

  switch (opts.command) {
    case "get_activity":
      if (!opts.activityId) {
        console.error("❌ Requires --activity-id");
        process.exit(1);
      }
      await getActivity(opts.domain, opts.appId, opts.token, opts.activityId);
      break;
    case "create_activity":
      if (!opts.body) {
        console.error("❌ Requires --body");
        process.exit(1);
      }
      await createActivity(opts.domain, opts.appId, opts.token, opts.body);
      break;
    // ... remaining cases
  }
}

The real problem isn't the line count — it's that parsing, validation, usage text, and error handling are scattered across different places. Adding a command means touching several spots, and it's easy to miss one.

I replaced all of that with createRouter:

import { createRouter, writeResult } from "skill-kits/runtime";

// ✅ Declarative: args, validation, routing — all in one place
const router = createRouter({
  name: "redbrick-activity",
  description: "...",
  commonArgs: {
    domain: { type: "string", required: true, desc: "Platform domain" },
    appId: { type: "string", required: true, desc: "App ID" },
    token: { type: "string", required: true, desc: "SSO token" },
  },
});

router.command({
  name: "get-activity",
  description: "Query activity details",
  args: {
    activityId: { type: "string", required: true, desc: "Activity ID" },
  },
  async handler({ domain, appId, token, activityId }) {
    writeResult(await getActivity(domain, appId, token, activityId));
  },
});

router.command({
  name: "create-activity",
  description: "Create an activity",
  args: {
    body: { type: "json", required: true, desc: "Activity fields as JSON" },
  },
  async handler({ domain, appId, token, body }) {
    writeResult(await createActivity(domain, appId, token, body));
  },
});

router.run(process.argv.slice(2));

After this abstraction, I stopped thinking about "how arguments get parsed" and started thinking about "what arguments does this command need, and what do I do with them."

3. stdout and stderr mixed together — the agent can't tell them apart

I used to dump everything with console.log and reserve console.error only for actual errors. The problem? The LLM had to guess which stdout line was the JSON result and which was a progress log. One wrong guess, one wasted invocation.

The better approach: stdout for JSON results, stderr for progress messages. On failure, a non-zero exit code tells the agent status: failed — far more reliable than making the LLM parse error text from stderr.

skill-kits provides three simple output helpers:

writeResult(payload);                          // stdout: single-line JSON for the agent
writeError(errorOrMessage, { code?, extra? }); // stderr: structured error + exitCode=1
notify("Fetching data...");                    // stderr: progress log

4. The agent thought the process was dead — it was just waiting 60 seconds

For D2C code generation, SSO login callbacks, and similar long-running tasks, the real problem isn't "actually timed out" — it's "looks like it timed out."

A naive 60-second sleep:

await new Promise((resolve) => setTimeout(resolve, 60_000));

Zero output for 60 seconds. The agent assumes the process hung and kills it.

Enter sleepWithHeartbeat:

import { sleepWithHeartbeat } from "skill-kits/runtime";

await sleepWithHeartbeat(60_000, {
  message: (rem) => `Waiting for code generation... ${rem}s remaining`,
});

It prints a heartbeat to stderr every 5 seconds, so the agent knows the process is still alive.

5. A "correct" `SKILL.md` isn't necessarily a good one

I learned this one the hard way. Two problems kept coming up:

Wrong triggers: the description was too vague, so the LLM didn't know when to invoke the skill. I had to keep tweaking keywords.
Too complete to edit: AI-generated SKILL.md files were thorough, but when I wanted to make changes myself, I didn't know where to start.

SKILL.md can't be fully standardized like code, but some things should absolutely be caught locally:

name doesn't match the directory name
Body is too long, blowing up the context window
references points to a wrong path
description is too short to convey trigger scenarios

skill-kits ships a set of lint rules that run automatically on pnpm build:

name-matches-dir: name must equal the parent directory name
body-line-limit: body over 500 lines → hard error
body-line-soft: over 400 lines → warning, suggest splitting into references/
description-length: description too short → warning
description-trigger / description-negative: checks for "when to trigger" and "when NOT to trigger" hints

You can customize thresholds via .skillkitrc.json:

{
  "lint": {
    "triggerHints": ["when", "trigger", "use when"],
    "negativeHints": ["don't", "do not", "never"],
    "descriptionMinChars": 80,
    "bodyLinesWarn": 400,
    "bodyLinesFail": 500
  }
}

There's also a second layer of reuse beyond the runtime: shared business modules. Internal API clients, domain constants, signing utilities — the kind of helpers that pop up across multiple skills. skill-kits init generates a workspace with packages/shared:

import { signRequest, BIZ_DOMAINS } from "@skills/shared";

At build time, esbuild inlines only the used parts. The output is still a single file, zero dependencies.

6. Change one line → `build` → `cp -r` → try again

When developing a Skill, you need to sync it to the agent's local skills directory to test. I used to do this manually:

cp -r dist/xxx ~/.agent/skills

After the tenth time, I added a dev mode:

pnpm dev daily-report --out ~/.agent/skills

It does two things in parallel:

esbuild watches src/ — .ts changes trigger rebuild
Watches SKILL.md / references/ / assets/ — resource changes sync directly to --out

Now I edit locally, and the agent picks up the latest version on its next invocation.

7. You don't know a Skill is broken until it fails in production

Skills run unattended inside agents. A broken command costs more than a broken CLI — you often don't find out until a run fails silently. Writing tests is worth it.

skill-kits keeps testing simple: test files go in src/**/*.test.ts, run with pnpm test, powered by node:test + tsx. Zero configuration.

The main thing you're testing: the exit behavior of a command — what JSON it writes to stdout, whether the exit code is 0 or 1, what error appears on stderr on failure.

Two helpers from skill-kits/testing:

captureOutput(fn): captures writeResult / writeError / notify output, plus process.exitCode
mockFetch(routes): replaces global fetch, no real network calls

A typical happy-path test:

import { test } from "node:test";
import assert from "node:assert/strict";
import { mockFetch, captureOutput } from "skill-kits/testing";
import { createActivity } from "./commands/create-activity.js";

const ctx = { domain: "https://example.com", token: "t" };

test("create-activity returns backend data with ok", async () => {
  const mock = mockFetch([
    { match: /\/activity\/create/, json: { code: 0, data: { activity_id: 9001 } } },
  ]);
  try {
    const { json, exitCode } = await captureOutput(() =>
      createActivity(ctx, { act_name: "test" }),
    );
    assert.equal(exitCode, 0);
    assert.equal((json as { activity_id: number }).activity_id, 9001);
  } finally {
    mock.restore();
  }
});

For pure functions, you don't need either helper — just import and assert. For error paths, commands throw a SkillError (the router maps it to exit code 1 + stderr JSON), and you catch it with assert.rejects.

One detail worth mentioning: mockFetch intentionally throws on unmatched requests. A missing mock will never silently pass — no "tests pass but production fails" surprises.

The daily loop is three commands:

pnpm new daily-report
# ... write code + tests ...
pnpm test daily-report     # run tests
pnpm build daily-report    # lint → bundle → zip

Wrapping up

skill-kits doesn't write your scripts or abstract your business logic. What it does is simple: it wraps entry points, build output, runtime utilities, and definition validation into a guardrail layer — so when you're building your 5th or 10th Skill, your brain is still on business logic, not boilerplate.

If you're building Agent Skills too, give it a try:

npx skill-kits init my-skills

GitHub: https://github.com/weijhfly/skill-kits

DEV Community

I Built 10 AI Agent Skills and Shrank 300 Lines of Boilerplate to 30

1. Three copies of `http.ts` across three skills

2. Seven subcommands, 250 lines of `parseArgs` + `switch`

3. stdout and stderr mixed together — the agent can't tell them apart

4. The agent thought the process was dead — it was just waiting 60 seconds

5. A "correct" `SKILL.md` isn't necessarily a good one

6. Change one line → `build` → `cp -r` → try again

7. You don't know a Skill is broken until it fails in production

Wrapping up

Top comments (0)

1. Three copies of http.ts across three skills

2. Seven subcommands, 250 lines of parseArgs + switch

3. stdout and stderr mixed together — the agent can't tell them apart

4. The agent thought the process was dead — it was just waiting 60 seconds

5. A "correct" SKILL.md isn't necessarily a good one

6. Change one line → build → cp -r → try again

7. You don't know a Skill is broken until it fails in production

Wrapping up

1. Three copies of `http.ts` across three skills

2. Seven subcommands, 250 lines of `parseArgs` + `switch`

5. A "correct" `SKILL.md` isn't necessarily a good one

6. Change one line → `build` → `cp -r` → try again