Mark Huang

Posted on Jun 20 • Originally published at markhuang.ai

I Might Be Wrong About Agentool

A developer comparing a lightweight CI automation machine with a much larger pile of maintenance work — The lightweight machine was real. So was the maintenance pile I had not priced in.

I built agentool because I thought I had found a clean optimization.

After Claude Code's internals became a public learning object, I wanted something Claude-Code-ish that could sit closer to the Vercel AI SDK world. File operations, shell execution, search, web fetching, memory, agents, output validation, context compaction: enough building blocks to run useful automation without pulling in a heavy full agent runtime every time.

The goal was practical. I wanted my GitHub Actions workflows to run lighter and faster. If I could keep the dependency surface small, maybe I could automate more things, run them more often, and spend more time on interesting work instead of babysitting the pipeline.

That was the assumption.

Now I am less sure.

Answer Snapshot

In 2026, my current answer is this: I might have optimized the wrong layer. Agentool's 23-tool Vercel AI SDK surface is still useful, especially for strict output validation, but the broader agent loop may be better delegated to maintained SDKs.

Original bet	What changed	Current move
Lighter CI dependencies would save meaningful time	Runtime matters less than maintenance and review quality	Use heavier SDKs when they own the agent loop better
Build Claude-Code-ish behavior around Vercel AI SDK tools	Agent platforms keep adding features I would need to chase	Keep agentool narrower instead of turning it into a platform
Skills and prompts could enforce workflow shape	Long workflows need hard schema boundaries	Keep `output_validator` and explicit handoff contracts

The optimization I was chasing

My mental model was simple: lighter dependencies mean faster CI jobs, faster CI jobs mean cheaper automation, and cheaper automation means I can afford to automate more of my work.

That logic is not wrong. It is just incomplete.

When a workflow runs once, runtime cost is easy to see. A job takes five minutes or twelve minutes. A package install is lightweight or heavy. A container is quick to start or slow to pull. Those numbers feel concrete, so they become tempting targets.

Maintenance cost is harder to see. It arrives later, one feature at a time.

Then the feature requests stopped feeling hypothetical. I wanted a cleaner way to fan work out across agents. I wanted runs I could pause and inspect without reading a wall of logs. I wanted permission rules that did not turn every workflow into a bespoke prompt contract. None of that is impossible with agentool, but every missing piece pulled me back into library work instead of the automation I was trying to finish.

But every approximation becomes code I own.

That was where the math stopped working. A few faster minutes in CI did not offset the hours I spent trying to keep up with products that already had teams working on this layer.

The part I still believe in

A structured output validator gate separating malformed workflow data from clean handoff objects — The validator is where a messy handoff stops before it leaks into the next stage.

The strongest part of agentool, for me, is still output_validator.

Long automated workflows are fragile at the boundaries. Stage one produces something. Stage two assumes that something has a specific shape. Stage three assumes stage two preserved the contract. If any earlier step returns almost-correct JSON, the failure may not show up until much later, and by then the debugging cost is much higher.

Skills can describe a workflow. They can say what the assistant should do, which files to inspect, which checks to run, and what output format to use. But a skill does not guarantee that a model's output satisfies a complex schema.

A validator does.

This is the part I do not want to leave to prompt discipline. If the next stage expects a nested JSON object with exact fields, discriminated variants, arrays, enum values, and recovery instructions, the previous stage should prove it has that shape before anything else touches it.

So I am not walking away from agentool. The validator pattern still earns its keep, and I still want those hard boundaries in my workflows.

The uncomfortable question

The question I kept asking myself was: is the rest worth it?

More specifically: does it actually matter how fast my CI job runs?

Sometimes, yes. But not as much as I had assumed.

If a workflow already runs asynchronously, opens a pull request, and waits for review, a few saved setup minutes are not what I notice. I notice the bad draft I now have to review, the silent failure three steps back, or the moment I realize changing one stage means rebuilding another slice of an agent runtime.

That was the uncomfortable part. I had optimized job weight, but the bill showed up as feature ownership.

Why the SDKs started looking reasonable

A developer moving workflow blocks from a custom toolkit into maintained agent SDK workstations — The split I trust more now: my repo owns configuration and contracts; the SDK owns the moving agent machinery.

The Claude Agent SDK and the Codex SDK change the equation for me.

They are heavier. That part is real. But they also come with the agent loop, context management, tool behavior, and product-level features that I otherwise keep trying to rebuild from the outside. Codex's docs explicitly position the SDK for CI/CD pipelines and internal tools. Claude's Agent SDK gives access to the same general class of coding-agent behavior that powers Claude Code, programmable from TypeScript or Python.

I do not want to spend my nights rebuilding that layer.

My current setup also makes the split more practical. Claude's SDK can connect through my personal proxy. Codex's SDK can connect through my ChatGPT subscription. Instead of forcing one lightweight library to become every agent runtime I want, I can let the maintained systems do the agent work and focus my own code on configuration, workflow boundaries, and validation.

That sounds less elegant in one way. There are more moving parts. More auth. More configuration. More vendor-specific behavior.

But it also sounds more honest. The question was less whether I could write another wrapper around tools and more whether I wanted to maintain a competing agent platform by accident.

The new split I am moving toward

I am starting to think about the boundary like this:

Concern	Keep close	Delegate
Strict structured outputs	Validators, schemas, repair loops, handoff contracts	Model self-discipline
Agent loop and coding workflow	Configuration, acceptance gates, repo-specific rules	Claude Agent SDK or Codex SDK
CI speed	Cache, isolate, avoid needless installs	Do not let it dominate architecture
New agent features	Adopt when they change outcomes	Do not reimplement every platform feature
Custom Vercel AI SDK experiments	agentool can still be useful here	Full coding-agent behavior

That table is not final doctrine. It is where my thinking is today.

That is the boundary I want to keep. Agentool can stay a lightweight tool collection with strict output validation. It does not have to chase every feature in richer coding-agent products.

The risk in the new direction

The new direction has its own traps.

Vendor gravity is the obvious one. If I move too much into Claude-specific or Codex-specific workflows, my automation gets harder to port, harder to test locally, and more sensitive to product changes. The part I can control is the adapter boundary. Prompts, schemas, environment setup, and acceptance checks should stay in my repo instead of disappearing into a vendor-specific black box.

Auth is the boring trap, which usually means it is the one that will break first. A workflow that depends on personal subscription auth or a proxy can fail in ways a plain API-key job does not. I need explicit config checks, clear failure messages, secret sync where appropriate, and no silent fallback that pretends the workflow ran correctly.

The easiest mistake would be giving up the part that made agentool useful. If I delegate everything to agent SDKs and stop enforcing structured handoffs, I will recreate the same long-workflow fragility in a heavier package. The validators need to stay at the boundaries. Let the SDKs drive, but do not let them hand sloppy data to the next stage.

What clicked

What clicked for me was narrower than "agentool is wrong."

I think I had been using agentool to solve the wrong layer of the problem.

I wanted lighter CI because I wanted more automation. But the limiting factor for more automation is not always the job runtime. Sometimes it is how much platform behavior I have to maintain myself. Sometimes the right answer is to pay the dependency cost and stop rebuilding the moving parts that a maintained SDK already owns.

So I have started moving my automation workflows toward Claude Agent SDK and Codex SDK. I would rather spend my time designing the workflow, defining the handoff contracts, and deciding what "done" means than scanning the horizon for the next agent feature I need to recreate.

I do not love more configuration. I like it more than accidentally owning platform maintenance.

The rule I am using now

The rule is boring, which is probably a good sign: I build the parts that enforce my workflow, and I delegate the parts that keep up with the agent platform.

For me, that means validators, schemas, acceptance gates, repo rules, and workflow intent stay close. Agent loops, tool orchestration, context behavior, and fast-moving platform features can move to the SDKs that already exist.

I might be wrong about this too. Fine.

I am not taking "lightweight tools are bad" from this. I am taking something narrower: an optimization target can expire. When the cost model changes, staying loyal to the old target becomes over-engineering.

I built agentool to save time. If keeping it at the center starts costing more time than it saves, the honest move is to change the center.

Originally published at markhuang.ai