I Built an API for LLM JSON Validation in Rust — Here’s What I Learned

#ai #llm #webdev #rust

I kept hitting the same wall: LLM outputs breaking production. Trailing commas, unquoted keys, JSON wrapped in markdown—every deploy felt like a game of whack-a-mole. Prompt engineering helped a bit, but I didn’t want to spend the next year tuning “please return valid JSON” for every new feature. So I built JSON Guardian: an API that validates, repairs, and enforces JSON from LLM outputs. This post is about the technical choices, the hard parts, and what I’d tell myself on day one.

Why an API instead of “just fix the prompts”?

Prompts can improve things, but they don’t fix the underlying issue: LLMs are trained on messy data and they’re not deterministic. You can ask for JSON and still get prose, markdown, or invalid syntax. I wanted a single place that sits between the model and my app—something that always returns either valid, schema-conforming data or a clear error. That’s easier to reason about than scattering retries and regex across the codebase.

I also wanted it to work from any stack: Node, Python, n8n, whatever. So I built an API, not a library. You get a key, you send HTTP requests, you get back structured success or failure. No language lock-in, no “install this SDK first.”

Why Rust?

Performance.

This layer runs on every LLM response. If it adds 50–100ms, users notice. I was aiming for sub-10ms p99 so that validation feels free compared to the model call. Rust made that realistic: no GC pauses, predictable latency, and the ability to tune hot paths without fighting a runtime.

Memory safety and no runtime surprises.

No null-pointer or type surprises at runtime. The compiler catches a lot of bugs before they hit production. For a service that parses untrusted LLM output, that’s a big deal.

Trust and reliability.

Rust’s ecosystem gave me what I needed without having to build a JSON parser or a web server from scratch. I could focus on the validation and repair logic instead of fighting the runtime. For a closed-source service, that means we can keep the implementation details in-house while still being able to talk about why we chose Rust: speed, safety, and predictable behaviour.

Architecture in a nutshell

API layer: HTTP, async, built for the “one request → validate/repair → response” model. Kept simple so latency stays predictable.
Validation: JSON Schema Draft 7. Same standard everyone knows; easy to document and reuse.
Storage: A database for API keys and usage tracking. The service stays stateless per request; state lives in the backend.
Endpoints: Validate, repair, enforce (repair + schema + type coercion), extract (strip JSON from prose/markdown), partial (complete streaming JSON), batch (any operation on many items). One concern per endpoint so callers can compose what they need.

Endpoint	Purpose	Best For
Validate	Check JSON against schema	Strict validation only
Repair	Fix common syntax errors	Malformed JSON cleanup
Enforce	Repair + schema + type coercion	Fixing and conforming LLM output
Extract	Strip JSON from prose/markdown	Raw model responses
Partial	Complete streaming JSON	Real-time UI updates
Batch	Bulk operation on many items	High-volume processing

I kept deployment simple: single region, no Kubernetes. Latency stays low because the service is small and the runtime is predictable.

The hard parts

Parsing malformed JSON.

You can’t validate what you can’t parse. So the repair step had to come first: fix trailing commas, single quotes, unquoted keys, and then feed the result into the schema validator. Getting the repair logic right without over-correcting (or breaking valid edge cases) took most of the early iteration. I had to draw a line: fix obvious syntax, don’t guess at semantics.

What to repair vs reject.

Too strict and you reject fixable output; too loose and you “fix” things into wrong data. I ended up with a small, well-defined set of repairs (trailing commas, quote normalization, etc.) and left the rest to validation. If repair can’t produce valid JSON, we return an error with a clear message instead of guessing.

Type coercion.

The enforce endpoint can coerce types—e.g. "twenty-five" → 25 for integer fields. Useful for LLM output; dangerous if overdone. I limited it to a few patterns (numbers, booleans, missing required with defaults) and made the response include a changes_made list so callers see what was altered.

Partial/streaming JSON.

Completing half-written JSON (e.g. for real-time UI during streaming) is a different problem from “is this string valid?” I added a dedicated partial endpoint that closes strings and brackets in a minimal way. It’s best-effort: great for display, but I don’t use it as the source of truth until the stream is done.

Launch: direct API + RapidAPI

Direct API: https://api.jsonguardian.com

For speed and control. Sub-10ms when you call us directly; no proxy in the middle.

RapidAPI: https://rapidapi.com/mtdevworks2025/api/json-guardian

For distribution and discovery. Same API, same behaviour; we accept RapidAPI headers and track those calls separately. Good for reaching people who already browse the marketplace.

Choice	Speed	Best For	Discovery
Direct API	Sub-10ms p99	Maximum control & lowest latency	Manual signup at jsonguardian.com
RapidAPI	Same via proxy	Marketplace reach & existing users	Browse RapidAPI marketplace

Pricing:

Free tier 10k requests/month (no card), then Starter (100k) and Pro (1M). Batch requests count each item as one. I kept the free tier generous so people can try it in real workflows without worrying about the first mistake.

Early feedback so far: people care about speed (“is it really under 10ms?”) and docs (“can I see the exact request/response?”). So I doubled down on the public API docs and the OpenAPI spec. The dashboard at jsonguardian.com handles signup and usage so you can see your own numbers.

What I’d do again (and what I’d change)

Ship fast, then iterate.

I could have spent months polishing the repair heuristics. Instead I shipped a small set of repairs and added more as real payloads showed up. That was the right call. You learn more from real usage than from hypothetical edge cases.

Documentation is part of the product.

The API is useless if people can’t figure out the body shape and error format. PUBLIC_README, OpenAPI, and a few copy-paste examples (Node, Python, curl) took real time but made the API feel “just works.” I’d do that from day one next time.

First 10 users > perfect product.

I’m optimising for the first 10–20 teams that actually use it. Their feedback (what broke, what they expected, what they’d pay for) is worth more than another week of tuning type coercion. So: ship, share in communities, and listen.

What I’d change:

I’d add integration examples (e.g. n8n, LangChain) earlier. A lot of interest comes from “I use X, how do I plug this in?” A short “use with n8n” or “use with OpenAI function calling” post would have saved back-and-forth. I’m doing that now.