DEV Community

andrii oliinyk
andrii oliinyk

Posted on

Structured Outputs vs Free-Form Summaries: Notes from an AI Regulatory Monitoring Build

Saw a case study from BN Digital on building an AI regulatory monitoring system and wanted to share the architectural takeaways, because they generalize beyond compliance to basically any LLM-in-production system.

The core problem

LLMs are great at producing fluent text. Fluent text is terrible as a programmatic interface. If your downstream system is a human reviewer with a checklist, a database, or another service, free-form summaries are the wrong output shape.

Three design choices worth stealing

1. Structured output schema instead of summaries

Typed fields with constrained values. Same input β†’ same output shape, every time. Diff-able across runs. Validates with a normal schema validator. Doesn't require a second LLM call to "parse" the first one.

{
  "regulation_id": "...",
  "jurisdiction": "...",
  "change_type": "amendment | new | repeal",
  "affected_entities": [...],
  "effective_date": "...",
  "source_citation": "..."
}
Enter fullscreen mode Exit fullscreen mode

Compare to "Here's a 4-paragraph summary of what changed" β€” same information, useless downstream.

2. Source filtering before the LLM step

Most "hallucination" in domain-specific work is a garbage-in problem. The model isn't inventing β€” it's pattern-matching to irrelevant context you gave it. Classical retrieval/filtering before the generation step cuts the surface area dramatically.

3. Human-in-the-loop as part of the type system

High-stakes outputs get a requires_review: true flag set by the model itself, with rules on what triggers it. Reviewer queue is a first-class part of the pipeline, not a thing bolted on after a compliance officer complains.

Why this matters beyond RegTech

The same pattern applies to any LLM system where outputs feed into other systems or workflows: medical decision support, financial analysis, legal drafting, infra automation. If your LLM output isn't typed, you're building a demo, not a system.

Full case study: https://bndigital.co/en-gb/cases/ai-regulatory-monitoring-system?utm_source=devto&utm_medium=backlink&utm_campaign=ai-reg

Curious if anyone here is using JSON schema enforcement (OpenAI structured outputs, Anthropic tool use, Outlines, etc.) in production and what's been brittle.

Top comments (0)