DEV Community

Cover image for LangChain + Tensorlake: Unlocking Document Understanding for Agents
Sarah Guthals, PhD for Tensorlake

Posted on • Originally published at tensorlake.ai

LangChain + Tensorlake: Unlocking Document Understanding for Agents

LangChain empowers developers to build sophisticated LLM applications by making reasoning, memory, and tool-use first-class components of an agent's decision-making. But as these applications evolve, a critical bottleneck emerges: the need to interact with unstructured, real-world data. This challenge is amplified in workflows driven by high-stakes documents (e.g. contracts, claims, reports, disclosures, and forms) where information arrives in messy formats like PDFs, scans, and handwritten submissions. Accurately parsing these documents often demands domain-specific logic and reliable infrastructure, far beyond what generic solutions can provide.

This is the problem Tensorlake was designed to solve.

Developer-first parsing for high-stakes workflows

Tensorlake is a robust, layout- and schema-aware document ingestion engine that transforms these complex documents into structured and indexable data for AI agents and indexes. Beyond what you would expect from other document ingestion engines, Tensorlake focuses on high-stakes, critical workflows where missing data simply isn't an option. Which is why Tensorlake offers resilient parsing through an ensemble of specialized models, with out-of-the-box features such as:

  • Document Layout Understanding: Multi-modal parsing that reduces hallucinations and provides bounding boxes for source citations.
  • Form and Table Understanding: Models that handle digital and hand-written forms, complex tables which are wide and long with merged cells and headers, and even forms with tables; converting every fragment on every page into a perfect HTML representation.
  • Strikethrough Detection: With 99% accuracy and consistency, Tensorlake models outperform other engines, especially on heavily red-lined documents (e.g. legal contract iterations).
  • Signature Detection: A model that not only detects digital, handwritten, and image-based signatures, but goes beyond a boolean of whether a signature exists on a page by extracting contextually relevant information.

Beyond the parsing capabilities, Tensorlake is also dedicated to an elegant developer experience. All documents are converted into both markdown chunks and structured data in a single API call, improving search in RAG and knowledge graphs by giving you the ability to do semantic or hybrid search with filters. So creating knowledge bases from unstructured and complex documents costs half as much as other providers in terms of processing time and dollars.

In short, Tensorlake offers programmable parsing pipelines that support custom schemas, field validation, and multi-pass extraction across pages and formats. And with high-stakes workflow features like Contextual Signature Detection and Strikethrough Detection, Tensorlake is the solution to ensuring all relevant data is extracted and ready to integrate within your workflow.

Tensorlake powers agents that understand documents

This robust and reliable engine unlocks LangChain users as it becomes a foundational component for understanding documents in agent-driven workflows. Imagine an agent evaluating the status of a loan package, or processing a property offer, or validating disclosures before approval. Each of these tasks require reliable access to accurate structured data. Tensorlake's APIs allow agents to offload the responsibility of understanding multi-modal document layouts, identify entities, tables, and signatures, and return standardized data that LangGraph agents can reason about.

Architectural diagram of a LangGraph agent calling the Tensorlake tool to parse documents with sigantures.

When it comes to orchestrating long-running, modular AI workflows with LangChain, Tensorlake is the factual grounding layer for document-heavy Agents.

Quick start: Tensorlake + LangGraph in action

Curious how it all works? Here's a lightning-fast overview of using Tensorlake with a LangGraph agent, using the langchain-tensorlake tool to analyze signatures in a document.

  1. Install the package:
pip install langchain-tensorlake
Enter fullscreen mode Exit fullscreen mode

And setup your environment variables for Tensorlake and OpenAI:

export TENSORLAKE_API_KEY="your_api_key"

# In this example, we use OpenAI's GPT-4o-mini model.
export OPENAI_API_KEY="your_openai_api_key"
Enter fullscreen mode Exit fullscreen mode
  1. Build a LangGraph Agent and attach the Tensorlake Tool:
# 1. Import the langchain-tensorlake tool and other necessary libraries
from langchain_tensorlake import DocumentParserOptions, document_markdown_tool
from langgraph.prebuilt import create_react_agent
import asyncio
import os

# 2. Define the document path
path = "path/to/your/document.pdf"

# 3. Define the question to be asked and create the agent
question = f"What contextual information can you extract about the signatures in my document found at {path}?"

# 4. Create an async function to for the agent to run
async def main():
    # 5. Create the agent with the Tensorlake tool
    agent = create_react_agent(
            model="openai:gpt-4o-mini",
            tools=[document_markdown_tool],
            prompt=(
                """
                I have a document that needs to be parsed. Please parse this document and answer the question about it.
                """
            ),
            name="real-estate-agent",
        )

    # 6. Run the agent
    result = await agent.ainvoke({"messages": [{"role": "user", "content": question}]})

   # 7. Print the result
    print(result["messages"][-1].content) 

# 8. Run the async function
if __name__ == "__main__":
    asyncio.run(main())
Enter fullscreen mode Exit fullscreen mode
  1. Run the agent and see the results:

You can use this sample real estate purchase agreement to test the agent. The agent will parse the document, detect signatures, and provide contextual information about them.

% python detect-signature.py
The signatures in the document are as follows:
The document contains multiple detected signatures, which indicate that it is a legal agreement primarily for a residential real estate purchase. Here’s a breakdown of the contextual information regarding these signatures:

1. **Parties Involved**:
   - **Buyer**: Nova Ellison
   - **Seller**: Juno Vega

2. **Document Details**:
   - The agreement is referred to as a "Residential Real Estate Purchase Agreement".
   - It was made effective on **September 20, 2025**.
   - The deal involves the purchase of a property located at **789 Solution Ln, San Francisco, CA 99999** for a purchase price of **$150,000**.

3. **Signature Locations**:
   - Signatures are detected multiple times throughout the document on various pages, usually found near sections regarding agreements, terms, and obligations that require consent from the parties involved.
   - Notable locations include:
     - Signature confirmed on **Page 1** and **Page 2**, often next to clauses concerning the acceptance of terms and conditions.
     - Buyer's and seller's initials are also present on these pages, indicating agreement to various sections of the contract.

4. **Execution Section**:
   - The final pages contain dedicated signature lines for both the Buyer, Seller, and an agent (Aster Polaris from Polaris Group LLC), confirming that all parties have accepted and executed the terms of the agreement.
   - The signatures are dated on **September 10, 2025**.

5. **Agent Signature**:
   - An agent's signature is also present, which indicates that a licensed real estate agent facilitated this transaction.

Overall, the signatures provide critical legal acknowledgment from all parties involved in the agreement, solidifying their acceptance of the terms and commitments outlined in the document.
Enter fullscreen mode Exit fullscreen mode

For the full tutorial with context and custom logic, check out the tutorial Real Estate Agent with LangGraph CLI on the Tensorlake Docs.

While this quick-start and tutorial serves as just one use case, the design pattern applies to insurance onboarding, legal intake, KYC processing, and many other verticals.

Your agents deserve better data — Tensorlake delivers

LangChain's ecosystem thrives when domain-specific tools like Tensorlake can offer reliable input to LLM-based reasoning pipelines. The value of a LangChain agent is directly tied to the quality and structure of the data it has access to. Tensorlake raises that bar for document workflows.

We're eager to see how developers extend this pattern into increasingly complex applications. Whether you're building an autonomous legal reviewer, an RAG assistant for financial disclosures, or a compliance bot for healthcare documents, the combination of Tensorlake and LangChain provides both flexibility and precision.

Explore the LangChain-Tensorlake Tool and learn how to add reliable document ingestion and parsing to your LangChain workflows today. Start with the signature detection tutorial or build your own workflow from the Tensorlake API docs.

Got feedback or want to show us what you built? Join the conversation in our Slack Community!

Top comments (0)