Archit Mittal

Posted on Apr 20

I Built a Claude Code Sub-Agent That Processes 2,000 Invoices a Day — Here's the Exact Setup

#agents #automation #claude #tutorial

A small Gurgaon-based CA firm came to me last month with a recurring nightmare: 2,000+ vendor invoices landing in a shared Drive folder every day. Three junior accountants spending six hours each manually keying them into Tally. That's ~₹1.8L/month in salary cost for pure data entry.

They asked if an AI agent could handle it.

Short answer: yes. And not just "handle" — Claude Code sub-agents turned out to be the cleanest way I've seen to structure this kind of multi-step workflow. Here's exactly how I built it, plus the code you can fork today.

Why Claude Code sub-agents

If you haven't touched sub-agents yet: they're specialized Claude instances you can spawn from a main agent to handle a scoped task. Each has its own context window, its own system prompt, and its own toolset. You define them as markdown files in .claude/agents/ — no framework, no DAG, no visual editor to learn.

The killer feature for this workflow is context isolation. A 50-page invoice PDF doesn't bloat the main agent's context. Extraction happens in a sub-agent, and only the structured JSON result comes back.

Three other reasons I reached for sub-agents here:

Per-agent model selection. Validation runs on Haiku for pennies. Extraction runs on Sonnet for accuracy.
Tight tool scopes. The writer agent can touch the Tally bridge; the extractor cannot. Blast radius stays small.
Independent prompt versioning. Each .md file lives in git. When accuracy dips, git diff tells you why.

The architecture

For this case I used three sub-agents orchestrated by a tiny Python loop:

pdf-extractor — pulls vendor name, GSTIN, line items, totals
tally-validator — checks GST math, flags anomalies, matches against vendor master
tally-writer — pushes validated records into Tally via its ODBC bridge

Main orchestrator handles retries, quarantining, and Postgres logging.

Sub-agent #1: the PDF extractor

File: .claude/agents/pdf-extractor.md

---
name: pdf-extractor
description: "Extract structured invoice data from Indian GST invoice PDFs. Returns JSON with vendor, GSTIN, line items, and totals."
tools: Read, Bash
model: sonnet
---

You extract structured data from Indian GST invoices.

For each PDF the user names:
1. Read the PDF contents.
2. Identify: vendor name, GSTIN, invoice number, invoice date, line items (with HSN codes), CGST, SGST, IGST, total.
3. Return a single JSON object. No prose. No markdown fences.

If GSTIN is missing or fails the 15-character format check, return {"error": "invalid_gstin"}.

That's the whole prompt — under 15 lines. The magic is in the description field: the main agent uses it to decide when to delegate.

Sub-agent #2: the validator

---
name: tally-validator
description: Validate extracted invoice JSON against GST rules and vendor master. Returns pass/fail with reasons.
tools: Read, Bash
model: haiku
---

Input: invoice JSON from the pdf-extractor agent.

Check:
- GSTIN passes the Modulus-36 checksum.
- CGST + SGST equals IGST-equivalent for intra-state invoices; IGST-only for inter-state.
- Line item totals sum to invoice total within a ±₹1 tolerance.
- Vendor GSTIN exists in /data/vendor_master.csv.

Return {"valid": true} or {"valid": false, "reasons": [...]}.

I put this one on Haiku. It's deterministic validation — no reasoning required, just arithmetic and lookups. Cost per invoice dropped from ₹0.40 to ₹0.06 when I moved validation off Sonnet.

Sub-agent #3: the writer

---
name: tally-writer
description: Post a validated invoice JSON into Tally via the ODBC bridge. Returns the Tally voucher number on success.
tools: Bash
model: haiku
---

Input: validated invoice JSON.

Call /usr/local/bin/tally-push with the JSON on stdin. If exit code 0, return {"posted": true, "voucher": <stdout>}. Else return {"posted": false, "error": <stderr>}.

Do not retry. Do not modify the JSON. Do not summarise.

Boring on purpose. This agent should feel like a shell command.

The Python orchestrator

import subprocess
import json
from pathlib import Path

INBOX = Path("/srv/invoices/inbox")
DONE = Path("/srv/invoices/processed")
FAILED = Path("/srv/invoices/failed")

def run_claude(prompt: str) -> dict:
    result = subprocess.run(
        ["claude", "-p", prompt, "--output-format", "json"],
        capture_output=True, text=True, timeout=120,
    )
    if result.returncode != 0:
        raise RuntimeError(result.stderr)
    return json.loads(json.loads(result.stdout)["result"])

def process(pdf_path: Path) -> dict:
    prompt = (
        f"Use the pdf-extractor sub-agent on {pdf_path}, "
        f"then pass the result to tally-validator. "
        f"If valid, use tally-writer to post it. "
        f"Return the final status JSON only."
    )
    return run_claude(prompt)

for pdf in INBOX.glob("*.pdf"):
    try:
        result = process(pdf)
        dest = DONE if result.get("posted") else FAILED
        pdf.rename(dest / pdf.name)
    except Exception as e:
        (FAILED / f"{pdf.stem}.error").write_text(str(e))
        pdf.rename(FAILED / pdf.name)

Drop it on cron every 15 minutes:

*/15 * * * * cd /srv/invoices && python3 run.py >> /var/log/invoices.log 2>&1

Done.

Numbers from week one

2,047 invoices/day average throughput
94.3% auto-posted without human touch
5.7% kicked to an exception queue (mostly scanned PDFs with poor OCR)
Compute cost: ~₹840/day across all three sub-agents
Replaced roughly ₹1.8L/month of manual effort
Payback period: about 2 weeks

The 5.7% exception rate is the number I'd watch. Two weeks in, a quick tweak to the extractor's prompt around handwritten delivery challans dropped it to 3.1%.

Three things I'd tell past-me

1. Version your sub-agent prompts from day one. Commit the .claude/agents/ folder to the project repo. When accuracy dips after a model update, git diff tells you which prompt tweak broke it.

2. Log the JSON, not the prose. My first version captured the full agent transcript. Useless for debugging at 2 AM. Now I log only the structured JSON each sub-agent returns, plus the prompt and timestamp.

3. Don't over-orchestrate. My initial design had five sub-agents. Three was plenty. If two stages always run together and never branch, they should be one agent. Sub-agents are a scoping tool, not a UML diagram.

Try it this week

If you want to kick the tires on sub-agents, here's a 90-second starter:

mkdir -p .claude/agents
cat > .claude/agents/summarizer.md <<'EOF'
---
name: summarizer
description: Summarize a file in 3 bullet points. Use when the user asks for a quick file summary.
tools: Read
---
Read the file the user mentions. Return exactly 3 bullets. No preamble.
EOF

claude -p "Use the summarizer sub-agent on README.md"

That's it. From there, copy the invoice pattern above — one agent per scoped task, tight tool lists, clear descriptions.

The real unlock isn't Claude being clever. It's the boring discipline of splitting work into agents small enough that each one can be reasoned about in isolation. Everything else is plumbing.

I'm Archit Mittal — I automate chaos for businesses. Follow me for daily automation content.

Top comments (3)

Mykola Kondratiuk • Apr 21

the org impact is the part that gets skipped in these writeups. 2000 invoices/day probably freed up 3 people - curious if any of them got redeployed or if the CA firm just stopped needing to hire.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.