Intro:
In the world of intelligent automation, extracting structured data from invoices using Large Language Models (LLMs) is a game-changer. But as powerful as these models are, how do we ensure their outputs are consistently accurate—especially when manual validation isn’t scalable?
In this post, part of the PP-AI Builder series, we explore a novel approach: using a second LLM to evaluate the output of the first.
By pairing extraction with automated evaluation, we move beyond just getting data—we build confidence in it.
This dual-prompt strategy not only enhances trust in AI-driven workflows but also opens doors to scalable, self-validating automation within the Power Platform ecosystem.
Problem:
While LLMs offer a powerful way to extract structured data from semi-structured documents like invoices, their outputs can vary depending on prompt design, document formatting, and model behavior. In many real-world scenarios, users validate a few samples manually—an approach that doesn’t scale and lacks consistency.
This raises a critical question:
How do we ensure the reliability and consistency of LLM-generated outputs without relying solely on manual review?

Supercharge Custom Data Entity Extraction using Bring your Prompt with AI Builder
Bala Madhusoodhanan ・ Apr 14
In invoice automation, even small errors in fields like invoice number, date, or total amount can lead to downstream issues in financial systems. Therefore, a scalable and intelligent validation mechanism is essential.
To address this, we introduce a dual-prompt architecture using two LLMs:
LLM #1 – Extraction Prompt:
This model receives the raw invoice text and extracts key fields such as invoice number, date, vendor name, and total amount.
LLM #2 – Evaluation Prompt:
This second model takes the original invoice text and the extracted output from LLM #1, then evaluates the correctness and consistency of the extraction. It acts as a “reviewer,” providing feedback or a confidence score.
🎯 ObjectiveValidate the accuracy of structured data extracted by the LLM from a PDF invoice, based on the presence or absence of the "RE-INVOICING INFORMATION FORM" section.
📥 Inputs Provided
PDF Document: ReInvoicePDF
Expected Output JSON: Json
Validation Steps
1) Run the LLM Extraction
Submit the PDF to the LLM for processing.
Capture the actual output JSON returned by the model.
2) Compare Metadata
Check if the TypeOfInvoice and ReInvoicingFormPresent flags match the expected values.
3) Validate Re-Invoicing Fields (if form is present)
Compare each field in HeaderFields and DetailedFieldsFromReInvoicingDescription with the expected output.
Ensure all required fields are present and correctly named.
Confirm values match or are reasonably close (e.g., formatting differences).
4) Validate Standard Invoice Fields
Confirm presence of StandardInvoiceFields regardless of form detection.
Validate:InvoiceDate,InvoiceRef,DebitNoteNumber,DueDate,InvoiceTotalAmount
Check each LineItem object for completeness and accuracy.
5) Check for Nulls and Missing Data
Ensure missing fields are represented as null or omitted appropriately.
Flag any unexpected omissions or misclassifications.
6) Log Discrepancies
Record any mismatches between actual and expected output.
Categorize issues as:Field missing,Incorrect value, Incorrect format, Unexpected extra data
✅ Pass Criteria
All required fields are present and correctly extracted.
Metadata flags match expected values.
Field values are accurate and formatted correctly.
No critical fields are missing or misclassified.
Line items are complete and correctly structured.
Result:
An example Output JSON for evalution is as below
{
"ValidationSummary": {
"PDFFile": "ReInvoicePDF",
"ReInvoicingFormDetected": true,
"ExpectedReInvoicingFormPresent": true,
"MetadataMatch": true,
"HeaderFields": {
"TotalFields": 9,
"MatchedFields": 9,
"MissingFields": 0,
"Accuracy": "100%"
},
"DetailedFields": {
"TotalFields": 7,
"MatchedFields": 7,
"MissingFields": 0,
"Accuracy": "100%"
},
"StandardInvoiceFields": {
"TotalFields": 6,
"MatchedFields": 6,
"MissingFields": 0,
"Accuracy": "100%"
},
"LineItems": {
"ExpectedCount": 2,
"ActualCount": 2,
"FullyMatchedItems": 2,
"PartiallyMatchedItems": 0,
"Accuracy": "100%"
},
"OverallAccuracy": "100%",
"Discrepancies": []
}
}
This approach transforms the automation pipeline from a one-way extraction to a feedback-driven loop, enabling:
- Scalable validation without human intervention
- Continuous improvement of prompt design
- Greater trust in AI-powered document processing
Top comments (1)
Your prompting skills are amazing, it shows how the right inputs have a massive impact on the outputs