Assili Salim

Posted on Jun 28

Reusable Agent Skills Need Pre-Call Runtime Checks

#ai #api #javascript #llm

OpenAI’s recent Codex research includes one detail that matters for developers building agents:

26.6% of users use skills to share instructions for complex workflows, and more than 10% manage three or more concurrent Codex agents at some point each week.

That means agent usage is moving from one-off prompts toward reusable workflows.

That is good.

It also means failures can become reusable.

The problem

A bad prompt can waste one model call.

A bad agent skill can waste many runs.

A skill might encode:

how to retry
how to call tools
how to inspect files
how to recover from errors
how much context to add
when to continue
when to stop

If those rules are loose, every run inherits the looseness.

This is the part developers need to treat carefully.

Reusable agent behavior needs reusable runtime boundaries.

A naive agent skill

Imagine a coding-agent skill for fixing failing tests.

The instruction might be:

const skill = {
name: "fix-failing-tests",
instructions: Inspect the failing test. Find the relevant files. Apply a fix. Run the tests again. Repeat until the tests pass.,
};

This sounds fine.

But “repeat until the tests pass” is dangerous without runtime limits.

What if the test failure is environmental?

What if the agent keeps editing unrelated files?

What if the prompt barely changes across attempts?

What if each retry adds more context?

What if the fallback model has unknown pricing?

The skill is useful.

The runtime is under-specified.

Add a pre-call decision

Before every provider call, the runtime should decide whether the call is still allowed.

type BeforeCallInput = {
runId: string;
workflowId?: string;
model: string;
prompt: string;
stepCount: number;
retryCount: number;
budgetRemaining: number;
previousPrompts: string[];
progressState: {
testsImproved?: boolean;
errorsChanged?: boolean;
filesChanged?: boolean;
};
};

type GuardDecision =
| { allowed: true }
| {
allowed: false;
reason:
| "unknown_model_pricing"
| "budget_exceeded"
| "max_steps_exceeded"
| "retry_storm_detected"
| "similar_prompt_loop"
| "no_progress";
error: Error;
};

Then use it before the provider call:

const decision = guard.beforeCall({
runId,
workflowId,
model,
prompt,
stepCount,
retryCount,
budgetRemaining,
previousPrompts,
progressState,
});

if (!decision.allowed) {
return {
status: "stopped",
reason: decision.reason,
error: decision.error,
};
}

const response = await provider.call({
model,
prompt,
});

The exact API does not matter.

The placement matters.

The check happens before the provider call.

What should the guard check?

Known model pricing

If the runtime cannot price the model, it cannot enforce a budget.

if (!pricingCatalog.has(model)) {
return {
allowed: false,
reason: "unknown_model_pricing",
error: new Error(Unknown pricing for model: ${model}),
};
}

Do not guess.

Fail closed.

Budget remaining

Agent workflows should have task-level budgets.

if (estimatedNextCallCost > budgetRemaining) {
return {
allowed: false,
reason: "budget_exceeded",
error: new Error("Agent run budget exceeded"),
};
}

A small bug fix and a long migration should not share the same budget.

Max steps

Agents need step limits.

if (stepCount >= maxSteps) {
return {
allowed: false,
reason: "max_steps_exceeded",
error: new Error("Maximum agent steps exceeded"),
};
}

This is basic production hygiene.

If a workflow cannot complete inside a reasonable number of steps, it should stop.

Retry storms

Retries are useful.

Blind retries are expensive.

if (retryCount >= maxRetries && recentErrorsAreSimilar(errors)) {
return {
allowed: false,
reason: "retry_storm_detected",
error: new Error("Retry storm detected"),
};
}

The goal is not to ban retries.

The goal is to stop repeated failure.

Prompt loops

A prompt loop happens when the agent keeps asking nearly the same thing.

if (similarToRecentPrompt(prompt, previousPrompts)) {
return {
allowed: false,
reason: "similar_prompt_loop",
error: new Error("Similar prompt loop detected"),
};
}

Even a simple similarity check can catch obvious loops.

No progress

A run can be active and still not improve.

Track progress signals:

did tests improve?
did the error change?
did files change meaningfully?
did a checklist item complete?
did the agent reduce uncertainty?

If several steps pass without progress, stop.

if (!madeProgress(progressState, recentSteps)) {
return {
allowed: false,
reason: "no_progress",
error: new Error("Agent run is not making progress"),
};
}
Why concurrency makes this more important

OpenAI’s Codex research says more than 10% of users manage three or more concurrent agents at some point each week.

That changes the risk.

One agent wasting a few calls is visible.

Several agents each wasting a few calls can look normal.

The local loop becomes a global budget problem.

For parallel workflows, add shared budget checks:

if (estimatedNextCallCost > workflowBudgetRemaining) {
return {
allowed: false,
reason: "workflow_budget_exceeded",
error: new Error("Workflow budget exceeded"),
};
}

Each agent needs its own limit.

The workflow needs a shared limit.

Both matter.

Where AI CostGuard fits

AI CostGuard is the local-first TypeScript / Node.js runtime safety layer I’m building for this problem.

It focuses on pre-call protection for AI-agent projects:

retry storms
prompt loops
max-step explosions
runaway execution
unknown model pricing
budget overruns
uncontrolled provider calls

It is not a billing ledger.

It is not a hard security boundary.

It does not replace provider dashboards.

The goal is to stop obviously risky calls before they execute.

Takeaway

Reusable agent skills are a good abstraction.

But they should not only package instructions.

They should also inherit runtime policy.

Before every provider call, ask:

Should this call still happen?

That one question catches many expensive agent failures before they become API usage.

Tags: ai, agents, typescript, devtools
https://github.com/salimassili62-afk/ai-costguard

DEV Community

Reusable Agent Skills Need Pre-Call Runtime Checks

Top comments (0)