DEV Community

Alain Airom
Alain Airom

Posted on

Quack into Action! Building Brilliant Agents with Docling-Agent & mellea

Using Docling-Agent to build powerful agentic operations on documents, such as writing, editing, summarizing, etc.

Introduction

For those who’ve followed my blog, Docling needs no grand introduction. My unwavering support for this tool stems from its unparalleled capacity to simplify document processing, effortlessly parse diverse formats — including an advanced understanding of complex PDFs (and also other widely used document formats— and provide truly seamless integrations with the broader GenAI ecosystem. It’s truly a game-changer.

Recognizing the growing demand for intelligent automation, the Docling team (almost) recently introduced a powerful agent module. This addition provides advanced capabilities crucial for implementing ‘agentic’ document processing, where intelligent agents can actively interact with and understand complex documents.

Despite this package being a work-in-progress, its potential immediately compelled me to dive in. I wanted to understand its mechanics and how it could revolutionize document processing within my projects, largely starting from the provided samples. What truly stood out was its elegant integration with
's generative programming—a sophisticated approach I've already lauded in a previous post.

Features of Docling-agent (excerpt of the GitHub repository)

  • Document writing: Generate well-structured reports from natural prompts and export to JSON/Markdown/HTML.
  • Targeted editing: Load an existing Docling JSON and apply focused edits with natural-language tasks.
  • Schema-guided extraction: Extract typed fields from PDFs/images using a simple schema and produce HTML reports. See examples on curriculum_vitae, papers, invoices, etc.
  • Model-agnostic: Plug in different backends via Mellea model_ids (e.g., OpenAI GPT OSS, IBM Granite).
  • Simple API surface: Use agent.run(...) with DoclingDocument in/out; save via save_as_* helpers.
  • Optional tools: Integrate external tools (e.g., MCP) when available.

Just as a reminder, I add hereafter “mellea” features and characteristics as well.

mellea Features

mellea is a Library for writing generative programs. Generative programming replaces flaky agents and brittle prompts with structured, maintainable, robust, and efficient AI workflows

  • A standard library of opinionated prompting patterns.
  • Sampling strategies for inference-time scaling.
  • Clean integration between verifiers and samplers.
  • Batteries-included library of verifiers.
  • Support for efficient checking of specialized requirements using activated LoRAs.
  • Train your own verifiers on proprietary classifier data.
  • Compatible with many inference services and model families. Control cost and quality by easily lifting and shifting workloads between: — inference providers — model families — model sizes
  • Easily integrate the power of LLMs into legacy code-bases (mify).
  • Sketch applications by writing specifications and letting mellea fill in the details (generative slots).
  • Get started by decomposing your large unwieldy prompts into structured and maintainable mellea problems.

Examples provided and tests

As mentioned above, I used the provided samples (for now) and merely adapted one to my usual habits of environment configuration (regarding floders etc…). Naturally, the simplest path involves cloning the repository and proceeding directly from there. But where’s the fun in that? I, of course, opted for the more ‘bespoke’ route: crafting my own directory structure, meticulously creating subfolders, and then, with immense effort, downloading a few files. It was only a slight bit more overhead, I assure you, for that truly unique setup experience!

Implementation

In my configuration these are the steps I took to prepare the environment.

uv venv myenv
source myenv/bin/activate

touch README.md
mkdir docling_agent
touch docling_agent/__init__.py
# assuming the pyproject.toml is downloaded and present
uv pip install -e .
Enter fullscreen mode Exit fullscreen mode

1-Write report

  • Write a new document sample (original code)
import os
from datetime import datetime

from mellea.backends import model_ids

from docling_agent.agents import DoclingWritingAgent, logger


def simple_writing_report(task: str):

    model_id = model_ids.OPENAI_GPT_OSS_20B
    # model_id = model_ids.IBM_GRANITE_4_MICRO_3B

    # tools_config = MCPConfig()
    # tools = setup_mcp_tools(config=tools_config)
    tools = []

    agent = DoclingWritingAgent(model_id=model_id, tools=tools)
    document = agent.run(task=task)

    # Save the document
    os.makedirs("./scratch", exist_ok=True)
    fname = datetime.now().strftime("%Y_%m_%d_%H:%M:%S")

    document.save_as_json(filename=f"./scratch/{fname}.json")
    document.save_as_markdown(filename=f"./scratch/{fname}.md", text_width=72)
    document.save_as_html(filename=f"./scratch/{fname}.html")

    logger.info(f"report written to `./scratch/{fname}.html`")

def advanced_writing_report(task: str):

    reasoning_model_id = model_ids.OPENAI_GPT_OSS_20B
    writing_model_id = model_ids.IBM_GRANITE_4_MICRO_3B

    # tools_config = MCPConfig()
    # tools = setup_mcp_tools(config=tools_config)
    tools = []

    # Initialize the agent with a base model id
    agent = DoclingWritingAgent(model_id=reasoning_model_id, tools=tools)
    # Configure specialized models for reasoning and writing
    agent.reasoning_model_id = reasoning_model_id
    agent.writing_model_id = writing_model_id

    document = agent.run(task=task)

    # Save the document
    os.makedirs("./scratch", exist_ok=True)
    fname = datetime.now().strftime("%Y_%m_%d_%H:%M:%S")

    document.save_as_json(filename=f"./scratch/{fname}.json")
    document.save_as_markdown(filename=f"./scratch/{fname}.md", text_width=72)
    document.save_as_html(filename=f"./scratch/{fname}.html")

    logger.info(f"report written to `./scratch/{fname}.html`")    

def main():

    task = (
        "Write me a document on polymers in food-packaging. Please make sure "
        "that you have a table listing all the most common polymers and their "
        "properties, a section on biodegradability and common practices to improve "
        "strength and durability."
    )

    # simple_writing_report(task=task)

    advanced_writing_report(task=task)



if __name__ == "__main__":
    main()
Enter fullscreen mode Exit fullscreen mode
  • After code run 🏃

  • And an excerpt of the output 📄
# Polymers in Food Packaging

In recent years, polymers have emerged as essential components in food
packaging due to their unique properties that protect food products from
spoilage, maintain freshness, extend shelf life, and ensure safety. The
primary role of these polymers is to act as an effective barrier against
moisture, oxygen, light, and other contaminants that can compromise the
quality and integrity of the packaged food. By incorporating polymers
into packaging materials such as films, coatings, or laminates,
manufacturers can significantly reduce product degradation caused by
oxidation, microbial growth, or physical damage during storage and
transportation. Furthermore, advanced polymer-based systems are designed
to release controlled amounts of antimicrobial agents or oxygen
scavengers, thereby preventing the proliferation of harmful
microorganisms and preserving the safety of consumable products for
extended periods. The versatility of polymers allows them to be tailored
to specific packaging requirements, such as flexibility, rigidity,
transparency, or barrier properties against gases like carbon dioxide or
nitrogen. Overall, the integration of polymers into food packaging
represents a significant advancement in ensuring the quality and safety
of packaged foods while minimizing waste through extended shelf life.

## Introduction

The introduction elucidates the critical importance of polymer selection
in food packaging due to its impact on food safety, sustainability, and
consumer health. It highlights how polymers serve as barriers against
moisture, oxygen, and contaminants, thereby ensuring product integrity.
The document is structured to first discuss the fundamental properties
of various polymers, including their mechanical strength, thermal
stability, and chemical resistance. Subsequently, it delves into
specific applications in food packaging, examining how different polymer
types can be tailored for use in varying environmental conditions.
Furthermore, the introduction sets the stage for a comparative analysis
of traditional versus emerging biodegradable polymers, emphasizing the
need for sustainable materials that align with global efforts to reduce
plastic pollution and enhance eco-friendliness in consumer products.
...
Enter fullscreen mode Exit fullscreen mode
  • Adapted code changing the output folder with a timestamped file naming output 👇
import os
from datetime import datetime

from mellea.backends import model_ids

from docling_agent.agents import DoclingWritingAgent, logger


# Define the output directory constant
OUTPUT_DIR = "./output"

def simple_writing_report(task: str):
    """
    Generates a document using a single model and saves it to the
    timestamped output folder.
    """
    model_id = model_ids.OPENAI_GPT_OSS_20B

   tools = []

    agent = DoclingWritingAgent(model_id=model_id, tools=tools)
    document = agent.run(task=task)

    os.makedirs(OUTPUT_DIR, exist_ok=True)


    fname = datetime.now().strftime("%Y_%m_%d_%H_%M_%S")
    filepath_base = f"{OUTPUT_DIR}/{fname}"

    document.save_as_json(filename=f"{filepath_base}.json")
    document.save_as_markdown(filename=f"{filepath_base}.md", text_width=72)
    document.save_as_html(filename=f"{filepath_base}.html")

    logger.info(f"report written to `{filepath_base}.html`")
   ---

def advanced_writing_report(task: str):
    """
    Generates a document using specialized models for reasoning and writing,
    and saves it to the timestamped output folder.
    """
    reasoning_model_id = model_ids.OPENAI_GPT_OSS_20B
    writing_model_id = model_ids.IBM_GRANITE_4_MICRO_3B

    tools = []

    # Initialize the agent with a base model id
    agent = DoclingWritingAgent(model_id=reasoning_model_id, tools=tools)
    # Configure specialized models for reasoning and writing
    agent.reasoning_model_id = reasoning_model_id
    agent.writing_model_id = writing_model_id

    document = agent.run(task=task)

   os.makedirs(OUTPUT_DIR, exist_ok=True)

   fname = datetime.now().strftime("%Y_%m_%d_%H_%M_%S")
    filepath_base = f"{OUTPUT_DIR}/{fname}"

   document.save_as_json(filename=f"{filepath_base}.json")
    document.save_as_markdown(filename=f"{filepath_base}.md", text_width=72)
    document.save_as_html(filename=f"{filepath_base}.html")

    logger.info(f"report written to `{filepath_base}.html`")


def main():

    task = (
        "Write me a document on polymers in food-packaging. Please make sure "
        "that you have a table listing all the most common polymers and their "
        "properties, a section on biodegradability and common practices to improve "
        "strength and durability."
    )


    advanced_writing_report(task=task)



if __name__ == "__main__":
    main()
Enter fullscreen mode Exit fullscreen mode

2-Edit report

  • Editng an existing document by usingnatural-language tasks to update a Docling JSON.
from pathlib import Path

from mellea.backends import model_ids

from docling_core.types.doc.document import (
    DoclingDocument,
)

from docling_agent.agents import DoclingEditingAgent, logger


def new_path(ipath: Path, ending: str) -> Path:
    return Path(str(ipath).replace(".json", ending))


def run_task(
    ipath: Path,
    opath: Path,
    task: str,
    model_id=model_ids.OPENAI_GPT_OSS_20B,
    tools: list = [],
):
    document = DoclingDocument.load_from_json(ipath)

    agent = DoclingEditingAgent(model_id=model_id, tools=tools)

    document = agent.run(
        task=task,
        document=document,
    )
    document.save_as_html(filename=opath)

    logger.info(f"report written to `{opath}`")


def main():
    model_id = model_ids.OPENAI_GPT_OSS_20B

    # tools_config = MCPConfig()
    # tools = setup_mcp_tools(config=tools_config)

    # os.makedirs("./scratch", exist_ok=True)
    ipath = Path("./examples/example_02_edit_resources/20250815_125216.json")

    for _ in [
        (
            "Put the polymer abbreviations in a seperate column in the first table.",
            new_path(ipath, "_updated_table.html"),
        ),
        ("Make the title longer!", new_path(ipath, "_updated_title.html")),
        (
            "Ensure that the section-headers have the correct level!",
            new_path(ipath, "_updated_headings.html"),
        ),
        (
            "Expand the Introduction to three paragraphs.",
            new_path(ipath, "_updated_introduction.html"),
        ),
    ]:
        run_task(ipath=ipath, opath=_[1], task=_[0], model_id=model_id)

if __name__ == "__main__":
    main()
Enter fullscreen mode Exit fullscreen mode
  • Output excerpt 📋
...
2025-11-24 10:25:29,594 - INFO - docling_agent - linearized chat:

   turn  role       message
------  ---------  -------------------------------------------------------------
     0  system     You are an expert writer and doc ... ext/5":3, "#/text/6": 2}
                   }
                   ```


     1  <unknown>  <mellea.stdlib.instruction.Instruction object at 0x130c57c50>
     2  <unknown>

  ```json
                   {
                       "operation": "update_content",
                       "ref": "#/table
2025-11-24 10:25:29,594 - INFO - linearized chat:

   turn  role       message
------  ---------  -------------------------------------------------------------
     0  system     You are an expert writer and doc ... ext/5":3, "#/text/6": 2}
                   }
                   ```


     1  <unknown>  <mellea.stdlib.instruction.Instruction object at 0x130c57c50>
     2  <unknown>

  ```json
                   {
                       "operation": "update_content",
                       "ref": "#/table
2025-11-24 10:25:29,594 - INFO - docling_agent - _update_content_of_document_items
2025-11-24 10:25:29,594 - INFO - _update_content_of_document_items
2025-11-24 10:25:29,594 - INFO - docling_agent - _update_content_of_table
2025-11-24 10:25:29,59
Enter fullscreen mode Exit fullscreen mode

3-Extract structured data with a schema

  • This sample defines a simple schema and provide a list of files (PDFs/images). The agent produces an HTML report with extracted fields.
from pathlib import Path
import json

from mellea.backends import model_ids

from docling_agent.agents import DoclingExtractingAgent, logger


def run_task(
    schema: dict,
    sources: list[Path],
    opath: Path,
    model_id=model_ids.OPENAI_GPT_OSS_20B,
    tools: list | None = None,
):
    agent = DoclingExtractingAgent(model_id=model_id, tools=tools or [])

    document = agent.run(
        task=json.dumps(schema),
        sources=sources,
    )
    document.save_as_html(filename=opath)

    logger.info(f"report written to `{opath}`")


def main():
    model_id = model_ids.OPENAI_GPT_OSS_20B

    schema_01 = {
        "name": "string",
        "birth year": "integer",
        "nationality": "string",
        "contact details": "string",
        "latest education": "string",
        "languages": "string",
        "skills": "string",
    }

    schema_02 = {
        "title": "string",
        "authors": "string"
    }

    schema_03 = {
        "invoice-number": "string",
        "total": "float",
        "currency": "string",
    }

    docdir = Path("./examples/example_03_extract")  # Adjust to your data root

    for _ in [
        (
            schema_01,
            "curriculum_vitae",
        ),        
        (
            schema_02,
            "papers",
        ),
        (
            schema_03,
            "invoices",
        )        
    ]:
        cdir = docdir / _[1]

        sources: list[Path] = []
        # Collect PDFs and PNGs recursively under each source directory
        sources.extend([p for p in cdir.rglob("*.pdf") if p.is_file()])
        sources.extend([p for p in cdir.rglob("*.png") if p.is_file()])
        sources.extend([p for p in cdir.rglob("*.jpg") if p.is_file()])
        sources.extend([p for p in cdir.rglob("*.jpeg") if p.is_file()])

        sources = sorted(sources)

        logger.info(f"documents [{len(sources)}]:\n\n\t" + ",\n\t".join(str(p) for p in sources))

        run_task(
            schema=_[0],
            sources=sources,
            opath=docdir / f"{_[1]}_extraction_report.html",
            model_id=model_id,
        )

if __name__ == "__main__":
    main()
Enter fullscreen mode Exit fullscreen mode

Conclusion

In conclusion, the synergy of Docling, Docling-Agent, and Mellea represents a formidable toolkit for anyone venturing into sophisticated AI agent development. Docling’s robust document parsing and understanding capabilities lay the perfect foundation, transforming raw information into actionable data. Building upon this, Docling-Agent streamlines the integration of these powerful features into intelligent workflows. Finally, Mellea’s generative programming prowess binds it all together, not just accelerating the development process, but fundamentally enhancing the intelligence and adaptability of agents tasked with complex document processing. This powerful combination truly elevates the art of agent creation, paving the way for more efficient, insightful, and dynamic AI solutions.

Thanks for reading 😎

Links

Top comments (0)