The extraction that worked 80% of the time
I built an extraction agent that pulled structured data from free-text input. Name, age, role, contact info. The LLM was given a JSON schema in the system prompt and told to respond strictly in that format. Manual testing looked good. I shipped it.
In production, roughly one in five responses came back malformed. Sometimes trailing commas that caused JSON.parse to throw. Sometimes a string where an integer was expected, which passed JSON parsing but failed schema validation downstream. Sometimes the model would wrap the JSON in a markdown code block with triple backticks, which the parse step did not handle. The extraction would throw, the pipeline would fail, and the error would surface to the user as a generic "something went wrong" message.
None of these were rare edge cases after a week of traffic. They were consistent patterns. The model was not being unreliable in a random way. It had specific failure modes that repeated. The trailing-comma issue happened most often on long responses. The markdown fence issue happened most often when the system prompt was close to the context window limit.
The fix felt obvious in retrospect: attempt to repair the response (strip fences, fix trailing commas, handle the common model output quirks), validate the repaired result, and if it still fails, retry the LLM call with the validation errors included as context so the model can see what it got wrong. The model usually gets it right on the second attempt when it can read its own error. But wiring that loop from scratch takes more code than it should, and I kept copying it across projects with small variations that made the copies diverge. agentcast is that loop, packaged.
Shape of the fix
import { AgentCast } from "@mukundakatta/agentcast";
import { z } from "zod";
const PersonSchema = z.object({
name: z.string(),
age: z.number(),
role: z.string(),
});
const cast = new AgentCast({
llm: async (prompt) => myLLMClient.complete(prompt),
validate: (json) => {
const result = PersonSchema.safeParse(json);
if (result.success) return { ok: true, data: result.data };
return { ok: false, errors: result.error.issues.map(i => i.message) };
},
maxRetries: 3
});
const result = await cast.cast(
"Extract the person details from: Alice, 30, senior engineer"
);
// result is { name: "Alice", age: 30, role: "senior engineer" }
On each attempt, agentcast tries JSON repair first. That means stripping markdown code fences, fixing trailing commas, handling single-quoted strings, and normalizing a few other patterns that models produce regularly. Then it passes the parsed object to your validate function. If validation passes, it returns the result. If it fails, it formats the validation errors into a follow-up prompt and calls your llm function again, with the errors visible to the model. This repeats up to maxRetries times. If all attempts fail, it throws with the full attempt history attached, so you can log exactly what happened without losing any context.
The llm function you pass receives the full prompt on each attempt. On retries, the prompt includes the previous response and the validation errors. You control the base prompt in your own code. agentcast handles the retry framing.
What it does NOT do
agentcast does not know your schema. It does not generate the prompt that tells the model to produce structured output. That is your job. If your base prompt does not ask for JSON in a specific format, agentcast will not fix that. The library only handles what happens after the model responds: repair, validate, retry. The quality of your system prompt is still the main variable.
The repair step is heuristic. It handles the common patterns that models produce. Severely malformed output, for example a response that is mostly prose with a JSON fragment buried in the middle, may not be repairable. The library makes an honest attempt and then moves to retry if repair fails.
agentcast also does not use any schema library itself. It does not import Zod, Ajv, or anything else. Your validate function is the schema layer. You decide what library to use, and the interface is a simple function call.
Inside the lib
The validator interface is intentionally minimal. Your validate function receives a plain JavaScript object and returns either { ok: true, data: T } or { ok: false, errors: string[] }. Zod works directly with .safeParse. Ajv works if you wrap the output in the expected shape. A hand-written function that checks specific fields works too. The library imposes no schema library dependency.
The retry prompt is built by agentcast and appended to the conversation. It looks like: "Your previous response had these validation errors: [errors]. Please fix them and respond with valid JSON only, no markdown fences." The errors come directly from your validate return value. This is why human-readable error messages from your validator improve retry success rates. Zod's .issues array maps naturally to this. Generic "invalid input" messages do not give the model enough to correct from.
The attempt history is stored internally and returned in the thrown error if all retries are exhausted. Each attempt entry includes the raw response string from the LLM, the parsed JSON object if parsing succeeded, the validation errors, and the latency for that attempt. This gives you everything needed to debug a failure or build a monitoring integration.
One configuration option worth knowing: repairOnly. If you set this to true, agentcast repairs and validates but does not retry the LLM. It throws immediately on validation failure after repair. This is useful in contexts where retry latency is not acceptable, but you still want the repair step to handle the common fence and trailing-comma issues.
When useful
- Your LLM produces slightly malformed JSON that a repair step could fix without a full retry
- You want automatic retry with validation error context on schema failure
- You need a validator-agnostic loop that works with Zod, Ajv, or a custom check
- You want the repair-validate-retry logic centralized instead of repeated across different agent tool implementations
When not useful
- Your LLM reliably produces valid JSON using provider-native structured output enforcement (OpenAI and Anthropic both offer this; prefer it when the model supports it)
- You need streaming responses where you validate as tokens arrive
- Your expected output is not JSON
- Retry latency is not acceptable in your use case
Install
npm install @mukundakatta/agentcast
# or
yarn add @mukundakatta/agentcast
Requires Node 18+. Zero runtime dependencies. Bring your own LLM client and validator.
Siblings
| Library | What it does | Registry |
|---|---|---|
| agentcast-rs | Rust port of the same repair-validate-retry concept | crates.io |
| @mukundakatta/agentvet | Validates tool call signatures before execution | npm |
| @mukundakatta/agentsnap | Snapshot tests for agent tool call sequences | npm |
| llm-json-repair | Heuristic JSON repair for LLM output (standalone) | crates.io |
| @mukundakatta/agentguard | Egress allowlist for agent HTTP calls | npm |
What is next
The most useful addition is a streaming mode where validation runs incrementally as each chunk arrives. For long responses where the model reliably starts with the JSON object, you could short-circuit the stream and trigger a retry earlier when the partial output already looks wrong. This would reduce latency on the retry path significantly for long responses.
Support for non-JSON structured output formats (TypeScript interfaces serialized as YAML, for instance) is something that has come up. The interface would look similar but the repair step would need format-specific handling. That is likely a separate package rather than an option flag inside agentcast.
Top comments (0)