How Docusign is Bringing Contract Table Extraction to Production with NVIDIA Nemotron Parse

#aie #ai #agents #nvidia

AI Engineer World's Fair Coverage

By Hiral Shah, Senior Director, Product Management, Docusign

A major recurring theme among the engineering teams at this week’s AI Engineer World’s Fair in San Francisco is the push to move specialized AI models out of research and directly into high-volume production.

At Docusign, that optimization challenge happens at massive scale: we handle millions of transactions daily and have nearly 1.9 million customers in over 180 countries. Organizations have historically lost significant value every year to the friction, delays, and missed obligations that come from treating these agreements as static documents rather than live sources of business data.

Much of that trapped value sits inside tables: the pricing schedules, SLA obligations, and contractor rate cards that define enterprise relationships but are often the hardest part of a contract to extract accurately.

To solve this, we integrated NVIDIA Nemotron Parse, a vision-language model purpose-built for document understanding, directly into our document processing pipeline.

Docusign and NVIDIA took the AI Engineer World’s Fair stage this week to give attendees a look at how the architecture works under the hood. Here’s what that looks like:

Why Contract Tables Break General-Purpose AI

Contracts routinely contain merged cells, multi-page structures, mixed formatting, and nested layouts that general-purpose vision language models (VLMs) and broad AI models weren't designed to handle. The result is inaccurate extractions that require manual correction, slowing down the workflows they are intended to accelerate.

Our teams watch this operational friction play out across real enterprise scenarios every day:

System Downtime: When a critical system goes down, operations teams need to know immediately which SLA notification requirements apply and to whom.
Resource Tracking: When business stakeholders ask legal what hourly rate was agreed to in a contractor engagement, the answer is often buried deep inside a rate card table.
Vendor Renewals: When procurement teams manage a complex vendor renewal, pricing structures scattered across multiple exhibits require significant manual review to piece together.

The Production Pipeline: From Layout To Structured Data

Docusign's document understanding pipeline processes agreements at scale, handling layout detection and Optical Character Recognition (OCR) across millions of documents. Adding reliable table extraction required a dedicated model layer that can handle the structural complexity those earlier stages couldn't fully resolve.

At the core of this integration is NVIDIA Nemotron Parse, a compact vision-language model that combines layout detection, OCR, and document semantics to interpret and reconstruct complex tables accurately.

For production deployment, the model infrastructure centers on two core requirements:

Serving with vLLM: Nemotron Parse is served via vLLM and integrated directly into Docusign's existing layout and OCR pipeline.
Data Governance & Locality: Sensitive agreement data stays entirely within Docusign's secure environment. Keeping documents local is a hard requirement when handling confidential business terms, while giving our engineering teams the flexibility to run and optimize the model for our specific use case.

Moving Beyond Synthetic Benchmarks

To properly validate this integration, we skipped clean, synthetic benchmarks, which fail to capture the formatting variations, inconsistent structures, and mixed-language content that enterprise contracts actually contain. Instead, we tested the architecture against real, complex enterprise contracts.

The accuracy and reliability of this production deployment gave NVIDIA the confidence to deploy Docusign IAM to manage its own enterprise agreements.

What’s Next On The Roadmap

The work doesn't stop here. Our engineering teams are continuing to improve model accuracy on more complex and varied table structures. We are also actively exploring deeper integrations with agentic workflows through the NVIDIA Agent Toolkit, and a public API for direct integration with downstream developer systems is coming soon.

Table extraction powered by Nemotron Parse is currently accepting beta customers for extractions in Agreement Manager, with full general availability on the horizon.

If you're building document intelligence pipelines or moving VLMs into production, how are your teams tackling structural layout variations? Let's swap notes in the comments!