Saw a case study from BN Digital on building an AI regulatory monitoring system and wanted to share the architectural takeaways, because they generalize beyond compliance to basically any LLM-in-production system.
The core problem
LLMs are great at producing fluent text. Fluent text is terrible as a programmatic interface. If your downstream system is a human reviewer with a checklist, a database, or another service, free-form summaries are the wrong output shape.
Three design choices worth stealing
1. Structured output schema instead of summaries
Typed fields with constrained values. Same input → same output shape, every time. Diff-able across runs. Validates with a normal schema validator. Doesn't require a second LLM call to "parse" the first one.
{
"regulation_id": "...",
"jurisdiction": "...",
"change_type": "amendment | new | repeal",
"affected_entities": [...],
"effective_date": "...",
"source_citation": "..."
}
Compare to "Here's a 4-paragraph summary of what changed" — same information, useless downstream.
2. Source filtering before the LLM step
Most "hallucination" in domain-specific work is a garbage-in problem. The model isn't inventing — it's pattern-matching to irrelevant context you gave it. Classical retrieval/filtering before the generation step cuts the surface area dramatically.
3. Human-in-the-loop as part of the type system
High-stakes outputs get a requires_review: true flag set by the model itself, with rules on what triggers it. Reviewer queue is a first-class part of the pipeline, not a thing bolted on after a compliance officer complains.
Why this matters beyond RegTech
The same pattern applies to any LLM system where outputs feed into other systems or workflows: medical decision support, financial analysis, legal drafting, infra automation. If your LLM output isn't typed, you're building a demo, not a system.
Full case study: https://bndigital.co/en-gb/cases/ai-regulatory-monitoring-system?utm_source=devto&utm_medium=backlink&utm_campaign=ai-reg
Curious if anyone here is using JSON schema enforcement (OpenAI structured outputs, Anthropic tool use, Outlines, etc.) in production and what's been brittle.
Top comments (0)