DEV Community

Mohit
Mohit

Posted on • Originally published at flux8labs.com

How We Automated Purchase Orders From Gmail to Tally Using GPT-4 (98% Extraction Accuracy)

At 9:14am on a Tuesday, the system flagged an incoming purchase order from a large enterprise buyer as a duplicate.

The PO had arrived in two separate emails over 48 hours — sent by different procurement contacts, both for the same batch of stainless steel flanges, same quantities, same delivery window.

Under the old system, a staff member would have read both, entered both into Tally, and allocated raw material stock twice. The first sign of the error would have been an inventory shortfall two weeks later.

The client is a Jaipur-based precision manufacturer serving enterprise buyers in India and overseas. At roughly ₹60Cr annual revenue, their team handled a steady flow of purchase orders across a demanding customer base.

Every one of those orders arrived as a PDF in a shared Gmail inbox.

Every one of those PDFs was read and entered into Tally by hand.

This is the build log for the system built to replace that process.


The Problem: PDFs in Gmail, Nobody Watching

The procurement workflow before the build:

  • Emails arrive in a shared Gmail inbox.
  • A staff member opens each attachment.
  • Reads part numbers, quantities, delivery deadlines, and supplier codes.
  • Manually enters everything into Tally.

On a slow day this took around 90 minutes.

On heavier order days, it could stretch to 3–4 hours.

The inbox had no workflow state:

  • No processed flag
  • No queue
  • No audit trail outside Tally

If the same PO arrived twice, the team would know only by accident.

Duplicate Orders

There was no detection mechanism.

Two contacts at the same enterprise customer could send the same PO independently, and neither Gmail nor Tally would flag it.

No Operational Visibility

Knowing which orders were due the following week required opening Tally and manually cross-referencing entries.

There was:

  • No dashboard
  • No queue view
  • No workload overview

Manual Raw Material Calculations

Once a PO was entered, another manual calculation followed:

  • Pulling specifications from the PDF
  • Checking stock levels
  • Estimating requirements

This introduced a second opportunity for human error.


What Was Built: A Four-Stage Pipeline

Each stage solved a different operational problem.

1. Gmail Push Notifications

Instead of polling Gmail every few minutes, the system registers a Google Cloud Pub/Sub topic that triggers a webhook whenever a new email arrives.

Benefits:

  • Near real-time processing
  • Lower infrastructure overhead
  • Faster visibility for operations teams

A PO is processed before a staff member would have opened the email.

2. GPT-4 Structured Extraction

The system sends the purchase order to GPT-4 using a strict JSON schema.

Required fields include:

  • PO number
  • Supplier details
  • Delivery date
  • Line items
  • Quantities

The model returns structured JSON directly.

No:

  • Regex maintenance
  • Template mapping
  • Field-position assumptions

For scanned PDFs, the vision endpoint is used.

For text PDFs, extracted text is sent directly.

Both paths produce the same JSON output.

3. Tally Integration

The extracted JSON is converted into a Tally-compatible XML purchase voucher and sent to Tally Prime's local HTTP server.

Once accepted:

  • The voucher is created automatically.
  • Inventory calculations run immediately.
  • The order appears exactly as if a user entered it manually.

4. Duplicate Detection

Every incoming PO is stored in PostgreSQL.

A normalized fingerprint is generated using:

  • Line items
  • Quantities
  • Delivery windows

This allows the system to detect duplicate orders before they reach Tally.

The duplicate mentioned earlier was caught using this mechanism.


The Technology Stack

Backend

  • FastAPI
  • PostgreSQL
  • Gmail API
  • Google Pub/Sub

Frontend

  • React
  • Framer Motion

AI Layer

  • GPT-4 Structured Outputs
  • GPT-4 Vision

ERP Layer

  • Tally Prime via TDL XML imports

Working with the Tally API

The official documentation leaves a lot to be desired.

The most reliable approach is:

  1. Enable Tally's HTTP server.
  2. Generate TDL XML vouchers.
  3. POST them to Tally.
  4. Parse the acknowledgement response.

Enable Tally HTTP Server

Gateway of Tally
→ F12
→ Advanced Configuration
→ Enable ODBC/HTTP Server
Enter fullscreen mode Exit fullscreen mode

Minimal Purchase Voucher Example

<ENVELOPE>
  <HEADER>
    <TALLYREQUEST>Import Data</TALLYREQUEST>
  </HEADER>
  <BODY>
    <IMPORTDATA>
      <REQUESTDESC>
        <REPORTNAME>Vouchers</REPORTNAME>
        <STATICVARIABLES>
          <SVCURRENTCOMPANY>[Company Name]</SVCURRENTCOMPANY>
        </STATICVARIABLES>
      </REQUESTDESC>
      <REQUESTDATA>
        <TALLYMESSAGE xmlns:UDF="TallyUDF">
          <VOUCHER VCHTYPE="Purchase" ACTION="Create">
            <DATE>[YYYYMMDD]</DATE>
            <VOUCHERTYPENAME>Purchase</VOUCHERTYPENAME>
            <PARTYLEDGERNAME>[Supplier Name]</PARTYLEDGERNAME>
            <ALLLEDGERENTRIES.LIST>
              <LEDGERNAME>[Ledger]</LEDGERNAME>
              <AMOUNT>[Amount]</AMOUNT>
            </ALLLEDGERENTRIES.LIST>
          </VOUCHER>
        </TALLYMESSAGE>
      </REQUESTDATA>
    </IMPORTDATA>
  </BODY>
</ENVELOPE>
Enter fullscreen mode Exit fullscreen mode

The first working voucher took three days.

Everything after that was extension work.


The Admin Dashboard

The dashboard became the team's primary operational interface.

Orders Panel

Shows:

  • Status
  • Supplier
  • PO Number
  • Delivery Deadline
  • Part Count

Duplicate orders are highlighted and blocked from entering Tally until reviewed.

Tally Panel

Provides:

  • Revenue trends
  • Receivables
  • Payment status

Without opening Tally.

Operations Panel

Handles:

  • Dispatch notes
  • QA checklists
  • Production milestones

All driven from the same purchase-order data.


What Failed First

The original extraction system used:

  • pdf-parse
  • OCR
  • Regex pipelines

The assumption:

Purchase order formats would remain stable.

They didn't.

Enterprise customers used:

  • Multiple templates
  • Different table structures
  • Scanned PDFs
  • International formats

The result:

  • ~60% success rate
  • Frequent maintenance
  • Silent extraction failures

One part-number transposition eventually led to production of an incorrect batch.

That was the turning point.

The system was rebuilt around GPT-4 structured outputs.

Validation against historical purchase orders increased extraction accuracy from roughly 60% to over 98%.


Three Lessons From the Project

1. PDF Extraction Has a Clear ROI

For teams processing purchase orders every week:

  • AI costs remain small.
  • Manual labor costs do not.

The business case becomes obvious once the numbers are compared.

2. Tally Integration Is Easier Than Its Reputation

The difficult part is finding a working example.

Once you successfully create one XML voucher, the rest is straightforward engineering.

3. Push Beats Polling

The biggest benefit wasn't lower latency.

It was trust.

A dashboard updated within seconds becomes an operational tool.

A dashboard updated every few minutes becomes a report.

That difference determines whether teams actually adopt the system.


Could This Work For Your Business?

If your team receives PDFs in Gmail and manually enters them into:

  • Tally
  • ERP systems
  • Inventory software

the workflow can likely be automated.

The core pattern is repeatable:

Gmail
→ AI Extraction
→ Validation
→ ERP Integration
→ Operations Dashboard
Enter fullscreen mode Exit fullscreen mode

This project went from discovery call to production deployment in six weeks.


FAQ

Does this work with Tally ERP 9?

Yes.

The XML import approach works with both:

  • Tally Prime
  • Tally ERP 9

The main requirement is enabling Tally's HTTP server.

Can it process scanned PDFs?

Yes.

Scanned PDFs use GPT-4 Vision.

Text PDFs use direct extraction.

Both produce the same structured output.

What if the AI extracts incorrect data?

The system includes:

  • Confidence scoring
  • Validation rules
  • Human review queues

Suspicious records are blocked before reaching Tally.

This prevents errors from entering downstream operations.

Top comments (0)