GITHUB : https://github.com/Waterbottles792/docapi
Large Language Models have made document understanding incredibly accessible.
Give an LLM an invoice, receipt, résumé, or contract, and it can usually tell you what's inside.
The problem begins when you need reliability.
Production systems don't need "usually."
They need predictable outputs, validation, and error handling.
That observation led me to build docapi.
The Problem
Most document extraction pipelines follow a simple pattern:
Document
↓
LLM
↓
JSON
This works well until the model:
- Returns invalid JSON
- Hallucinates values
- Misinterprets dates
- Omits required fields
- Produces inconsistent output formats
For AI agents, those failures become difficult to recover from.
I wanted to build something that treated reliability as the primary goal.
The Idea
Instead of prompting an LLM and hoping for the best, docapi works like this:
Document
│
▼
Text Extraction
│
▼
LLM Understanding
│
▼
Schema Validation
│
▼
Grounding Verification
│
▼
Deterministic Normalization
│
▼
Confidence Scoring
│
▼
Schema-Validated JSON
If the system cannot confidently produce valid output, it returns a structured error instead of silently returning incorrect data.
Features
The current version includes:
- REST API
- MCP server for AI agents
- Local inference with Ollama
- Cloud inference with Claude
- Schema validation
- Grounding checks to reduce hallucinations
- Deterministic date normalization
- Long-document chunking
- Confidence scoring
- Automated evaluation harness
- More than 80 automated tests
Why Deterministic Code Matters
One example I encountered was date parsing.
A language model occasionally interpreted:
26-05-2025
as the year 2605.
That's not an AI problem.
It's a software engineering problem.
Instead of trying to improve the prompt, docapi normalizes dates deterministically after extraction.
The same philosophy applies throughout the project.
Whenever a problem can be solved reliably with code, it shouldn't be delegated to the model.
Building for AI Agents
Another goal was making the system easy for agents to use.
Besides a REST API, docapi also exposes an MCP server, allowing AI assistants to call document extraction as a tool without additional integration code.
The extraction pipeline remains identical regardless of whether the caller is a Python application, an HTTP client, or an AI agent.
What I Learned
Building this project changed the way I think about AI engineering.
The model is only one part of the system.
The surrounding engineering matters just as much:
- Validation
- Error handling
- Evaluation
- Grounding
- Deterministic processing
- Observability
- Testing
Those pieces are what make AI systems reliable enough for production.
What's Next?
I'm continuing to expand docapi with:
- OCR support for scanned documents
- Additional model providers
- Larger evaluation datasets
- A managed hosted version
The goal remains the same:
Build AI systems that are not only intelligent, but predictable, measurable, and reliable.
If you've built similar AI infrastructure or have ideas for improving document extraction reliability, I'd be interested to hear your thoughts.
Top comments (0)