It started with a simple task: pay an F24.
For those outside Italy, an F24 is a unified tax payment form — a grid of tax codes, reference years, and amounts that you need to submit to pay everything from income tax to municipal levies. Every Italian with a tax obligation has dealt with one.
I opened my bank app, selected "Pay F24", and found the option to scan the document with my phone camera. Great, I thought. I took a clear photo in good lighting. The app processed it for a few seconds.
It filled in exactly zero fields.
I ended up spending the next fifteen minutes manually transcribing tax codes, reference years, and amounts from a paper form into a mobile keyboard. In 2026.
That was the moment I decided to build ChiaroDocIT.
The actual problem is bigger than one form
After that experience I started paying attention. The F24 issue wasn't an isolated quirk — it was a symptom of a much wider problem.
Italy has some of the most complex bureaucratic paperwork in Europe. The average person regularly receives documents they genuinely don't understand: cartelle esattoriali (tax bills from the revenue agency) with legal deadlines buried in "legalese", CU certificates they need for their tax return but can't interpret, INPS letters about their pension or benefits that require action but don't clearly say what.
Most people either ignore these documents until it's too late, pay a professional €50-100 to tell them what they already received, or spend hours on the phone with government helplines.
The information is right there on the paper. Nobody had made it accessible.
What ChiaroDocIT does
ChiaroDocIT is an API that takes any Italian bureaucratic document — as text, a photo, or a PDF — and returns structured JSON with everything you actually need to know:
- A plain-language summary in plain Italian (no legal jargon)
- Every deadline, with days remaining calculated from today
- All amounts, broken down by type
- An urgency level: low, medium, high, or critical
- Concrete next steps — not "consult a professional" but "pay within 60 days using F24 form, tax code 9001, via your bank's home banking or any post office"
It supports 13 document types: invoices, tax bills, payslips, F24 forms, CU certificates, INPS letters, Agenzia delle Entrate communications, company registry extracts, contracts, tax returns, legal notices, and more.
The OCR is included — you can send a smartphone photo of a handwritten document or a low-quality scan and it handles it.
The architecture that made it work
The first version used a single large prompt covering all 13 document types. It worked, but it had a persistent failure mode: the model would correctly identify an F24 in its text summary, then write "other" in the document type field of the JSON output.
The problem turned out to be attention dilution. When you give a model a 4,000-token prompt covering 13 document types with full schemas, by the time it gets to filling in the type field, it's lost the thread.
The solution was a two-step pipeline. Step one is a lightweight classification call — the model just identifies the document type with a confidence score. Step two uses a focused prompt built specifically for that document type, with the relevant regulations, field structure, and extraction instructions for that one type only.
The accuracy improvement was not marginal. It's the difference between "usually works" and "reliably works."
There's also a safety net: if the extraction step returns "other" but the classification step had correctly identified the document, the code forces the classification result. This catches the edge case where a model ignores its own instructions during extraction.
Why the two-tier model approach matters
For routing LLM calls I use OpenRouter, which makes it easy to switch between models without changing infrastructure. This turned out to be important for a specific reason.
There are two tiers of users: free tier and paid tier. They get different model quality, and the models behave differently enough that I maintain separate system prompts for each tier.
Free-tier models have a reasoning mode — they produce extensive intermediate thinking before answering. This is useful for complex reasoning tasks but catastrophic for structured JSON output: you get thousands of tokens of internal monologue before the actual answer. The free-tier prompts explicitly suppress this behavior.
Paid-tier prompts are richer, include more regulatory detail, and take advantage of longer context windows to embed relevant sections of Italian tax law directly into the system prompt. When the model knows that under D.Lgs. 110/2024 you can request up to 84 monthly installments without documentation for debts under €120,000, it can surface that in the output instead of just saying "consider a payment plan."
The ZDR (Zero Data Retention) distinction also matters here. Free-tier models may log requests — the documentation makes this explicit. Paid-tier users get routed through endpoints that contractually don't store or train on the data. For documents containing tax codes, salary information, and IBANs, this is not a minor consideration.
Embedding Italian law into prompts
The most interesting design challenge was making the API genuinely useful, not just structurally correct.
A JSON with the right fields but generic advice is not much better than just reading the document yourself. The value comes from the API knowing, for example:
- That a cartella esattoriale has a 60-day payment window from notification before enforcement actions begin
- That the Rottamazione-quinquies deadline is April 30, 2026, and some debts may qualify for penalty reduction
- That a traffic fine (multa) can be contested either at the Prefettura OR the Giudice di Pace, but not both — and the choice has different risk profiles
- That a CU certificate from an employer has three different deadlines depending on the type of income it certifies
This information is updated per document type and reflects current Italian regulations. It's not static — the prompts need maintenance as laws change.
The output for the F24 that started all this
Going back to where this started: here's what the API returns for that F24 that my bank app couldn't read.
The summary explains in plain language what taxes are being paid, for which years, and the total amount. The amounts array breaks down each tax code individually with its description. The metadata section lists every tax code found with its corresponding amount. The red flags field surfaces anything unusual — in this case, debts dated 2006 which might be worth checking for prescription with a tax professional. The cosa_fare field gives concrete instructions: which channel to use, what to do if you're a VAT holder (mandatory telematic channel), how long to keep the receipt.
Everything a bank app should have extracted from that photo. In under four seconds.
Plans and privacy
The API is available exclusively on RapidAPI.
Four plans: Basic (free, 4 documents/month), Pro ($9/month, 100 documents), Ultra ($29/month, 500 documents), and Mega ($79/month, 2,500 documents).
Privacy model: nothing is written to disk at any point. Documents are processed in RAM and discarded after the response. Basic and Pro plans use standard model endpoints — do not submit sensitive documents on these plans. Ultra and Mega plans use Zero Data Retention (ZDR) endpoints: documents are contractually never stored or used for AI training.
What's next
The F24 photo-to-payment use case is the one I'm most interested in developing further. The structured JSON output already contains everything needed to pre-populate a bank form: tax codes, reference years, amounts, section identifiers. The gap between "API returns JSON" and "bank app pre-fills the form" is an integration problem, not a technical one.
If you're building fintech infrastructure for the Italian market and find this interesting, I'd genuinely like to talk.
Drop a comment if you've hit similar friction points with document processing in other countries — curious how universal this problem is.
Top comments (0)