DEV Community

Mukunda Rao Katta
Mukunda Rao Katta

Posted on

agentcast-rs: Repair, Validate, and Retry LLM JSON Output in Rust

The bug that only happened in production

The structured output pipeline worked in development for six weeks. The tests passed. The staging environment passed. The model reliably returned valid JSON.

Then it shipped.

In production the prompts were longer. Users sent more context. The model started wrapping its JSON responses in explanatory prose.

Sure! Here is the data you requested:

{"status": "ok", "result": 42, "items": [...]}

Let me know if you need anything else.
Enter fullscreen mode Exit fullscreen mode

The parser called serde_json::from_str on the full response string. It panicked. The entire pipeline crashed. Every request that hit the longer-prompt path returned a 500.

The fix seemed obvious in retrospect: strip the prose, extract the JSON, retry if extraction fails. But that logic did not exist anywhere. Each team that hit this problem wrote a one-off fix. Some teams added a serde_json::from_str with a try/catch. Some teams added a regex to strip markdown fences. None of them had a retry path that called the model again with the error as context.

agentcast-rs is the composable fix.


The shape of the fix

[dependencies]
agentcast-rs = "0.1"
serde = { version = "1", features = ["derive"] }
serde_json = "1"
Enter fullscreen mode Exit fullscreen mode

Define your expected output structure:

use serde::{Deserialize, Serialize};
use agentcast_rs::{AgentCast, CastError};

#[derive(Debug, Deserialize, Serialize)]
struct AnalysisResult {
    summary: String,
    confidence: f32,
    flags: Vec<String>,
}
Enter fullscreen mode Exit fullscreen mode

Run the repair-validate-retry loop:

// The raw LLM response (may have prose, fences, or be valid JSON):
let raw = r#"Here is my analysis:

Enter fullscreen mode Exit fullscreen mode


json
{"summary": "looks good", "confidence": 0.91, "flags": []}

"#;

let result: AnalysisResult = AgentCast::new(raw)
    .max_retries(2)
    .cast()?;

println!("{}", result.summary); // "looks good"
Enter fullscreen mode Exit fullscreen mode


rust

When the response has prose or fences, agentcast-rs strips them and attempts to parse the JSON. If that succeeds, done. If the JSON is malformed, it passes through llm-json-repair to fix common issues (trailing commas, unquoted keys, truncated arrays). If that succeeds, done. If the repaired JSON still fails to deserialize into your type, it hands back a CastError with the validation failure.

For the retry path with a real LLM:

use agentcast_rs::{AgentCast, RetryContext};

let result: AnalysisResult = AgentCast::new(raw)
    .max_retries(2)
    .with_llm_retry(|ctx: RetryContext| async move {
        // ctx.attempt: which retry this is (1-indexed)
        // ctx.last_error: the validation error from the previous attempt
        // ctx.last_raw: the raw response that failed

        let prompt = format!(
            "Your previous response could not be parsed as JSON: {}\n\
             Please return only valid JSON matching this structure: {}\n\
             Previous response:\n{}",
            ctx.last_error,
            include_str!("schema.json"),
            ctx.last_raw,
        );

        // Call your LLM here and return the new raw response:
        call_anthropic(&prompt).await
    })
    .cast()
    .await?;
Enter fullscreen mode Exit fullscreen mode

The closure receives context from the failed attempt. It is responsible for making the LLM call and returning a new raw string. agentcast-rs runs the same parse-repair-validate loop on whatever the closure returns.


What it does NOT do

  • It does not make LLM calls on your behalf. The retry closure is caller-provided. You bring your own client, your own API key, your own prompt format.
  • It does not validate against JSON Schema. Validation is type-driven via serde::Deserialize. If your type cannot represent a field, that field fails.
  • It does not detect semantic incorrectness. If the model returns valid JSON that has the wrong values, agentcast-rs will not catch that. It only catches structural failures.
  • It does not cache successful parses or avoid re-parsing on retry. Each attempt is independent.

Inside the lib: BYO-LLM via async closure

The design decision that I think is worth explaining is the retry interface.

An earlier version of the crate had a Provider trait with a method async fn complete(prompt: &str) -> Result<String>. You implemented the trait for your LLM client and passed it in. This seemed clean.

It was not.

Trait objects with async methods in Rust require boxing and lifetime gymnastics. Every caller had to write a trait impl. Adding provider-specific options (temperature, model, system prompt) required either more trait methods or a separate config type that got passed through the trait somehow. The trait surface grew every time someone needed a new option.

The closure approach is simpler. The caller writes a plain async closure that takes a RetryContext and returns a Result<String>. The closure captures whatever it needs from the surrounding scope: the HTTP client, the API key, the model name, the system prompt, a specific temperature setting. No trait, no boxing (the closure is Fn + Send), no config type.

The tradeoff is that the closure is not statically typed as "an Anthropic client" or "an OpenAI client." You can call any LLM, or a mock, or a test fixture that returns a hardcoded string. That flexibility matters. During development you want to test the retry logic without making real API calls. With a closure, you can pass a |_ctx| async { Ok(valid_json_string.to_owned()) } and confirm the happy path. No mock trait, no test double, no extra dependency.


When this is useful

Use agentcast-rs when:

  • You call an LLM and parse its response as JSON. Any LLM, any provider.
  • Your prompts vary in length and you have seen the model occasionally add prose around its JSON output.
  • You want the retry to include the parse error as context for the model, so the model has a chance to self-correct rather than just retrying blindly.
  • You want the repair and retry logic in one place rather than scattered across pipeline stages.
  • You are building an agent where a structured extraction step is on the critical path and silent failures are not acceptable.

When NOT to use it

Skip agentcast-rs when:

  • Your LLM client already has structured output support that returns validated JSON from the API (Anthropic's response_format, OpenAI structured outputs). In that case the API enforces the structure and you do not need a repair loop.
  • Your expected type is complex enough that repair and retry are unlikely to help. If the model consistently misunderstands the schema, the issue is in the prompt, not in the parse layer.
  • You are in a hot loop calling the model thousands of times per second. The repair and retry overhead, while small, adds per-response allocation. Profile first.

Install

[dependencies]
agentcast-rs = "0.1"
Enter fullscreen mode Exit fullscreen mode

GitHub: MukundaKatta/agentcast-rs

Requires Rust stable. Dependencies: serde, serde_json, llm-json-repair.


Siblings

Lib Boundary Repo
agentcast (Python) Same repair-validate-retry semantics for Python MukundaKatta/agentcast
llm-json-repair The repair pass that agentcast-rs delegates to internally MukundaKatta/llm-json-repair
agentvet-rs Validates tool args before they are passed to the LLM, upstream of this MukundaKatta/agentvet-rs
agentsnap-rs Snapshot the structured result so you catch shape changes in CI MukundaKatta/agentsnap-rs

The pattern: agentvet-rs validates input args, the LLM call runs, agentcast-rs validates and repairs the output, and agentsnap-rs snapshots the final deserialized struct so you catch schema drift across model versions.


What's next

  • JSON Schema generation. Generate a schema from the Deserialize type and include it in the retry prompt automatically, so the model gets a machine-readable description of what went wrong and what structure is expected.
  • Streaming repair. For streaming responses, detect when the partial JSON is malformed early enough to retry without waiting for the full stream to complete.
  • Metrics hook. A callback that fires on repair, on retry, and on final success/failure so you can track how often each path is hit in production.

v0.1.0 shipped 2026-05-10. The repair-validate-retry loop is stable. If you find a malformed response pattern that slips through the repair step, open an issue with the raw string.


Part of the Hermes Agent Challenge sprint. The full agent-stack series is at MukundaKatta on GitHub.

Top comments (0)