Introducing a Game-Changer: Docling and IBM Cloud Code Engine Join Forces

#serverless #ai #docling #codeengine

Combining Docling capacities with IBM Cloud Code Engine serverless compute capacities

What is “Code Engine”?

Serverless computing has revolutionized how developers build and deploy applications by abstracting away the underlying infrastructure management. Instead of provisioning and maintaining servers, developers can focus purely on writing code, with the cloud provider automatically scaling resources up or down based on demand. This “pay-as-you-go” model makes serverless highly efficient and cost-effective for a wide range of workloads, from event-driven functions to web applications and batch jobs.

Stepping into this innovative landscape is the IBM Cloud Code Engine service, a fully managed, serverless platform designed to run your containerized workloads, batch jobs, and functions. Code Engine simplifies the deployment experience by allowing you to bring your existing container images or source code, and it handles everything from automatic scaling and load balancing to security and logging. Whether you’re building microservices, processing large datasets, or creating API endpoints, Code Engine provides a flexible and powerful environment that adapts to your application’s needs without the operational overhead of traditional server management.

Code Engine Serverless Fleets

For developers tackling demanding, compute-intensive workloads that require significant processing power, I*BM Cloud Code Engine’s “Serverless Fleets”* offer an unparalleled solution. This feature provides the easiest and most efficient way to execute large-scale tasks, whether they demand powerful CPUs or specialized GPUs. Imagine running complex Monte Carlo simulations, intricate financial risk modeling, or elaborate chemical molecule calculations without the burden of provisioning or managing dedicated hardware. Furthermore, Code Engine’s Serverless Fleets are perfectly suited for cutting-edge AI tasks, enabling rapid inferencing, AI training, and model fine-tuning — even supporting serverless GPUs for accelerated performance. This capability liberates data scientists and engineers to focus on their algorithms and models, while Code Engine automatically scales the underlying compute resources to meet the exact requirements of even the most demanding, fluctuating workloads.

What is Docling?

Docling is a powerful, open-source toolkit developed by IBM Research that addresses one of the most significant challenges in building modern AI applications: unlocking the knowledge trapped within unstructured documents. Unlike traditional tools that simply extract raw text, Docling uses a sophisticated pipeline powered by AI models to understand the document’s structure, layout, and content. It can intelligently parse a wide variety of formats, including PDFs, DOCX, and images, and convert them into a unified, richly structured representation like Markdown or JSON. This capability is absolutely crucial for Generative AI and Retrieval-Augmented Generation (RAG) systems. By transforming messy, unstructured data into a clean and machine-readable format, Docling provides the high-quality “fuel” that enables large language models (LLMs) to deliver more accurate, contextual, and reliable responses. This is the key to building powerful RAG pipelines that can ground AI responses on trusted enterprise data, rather than relying on a model’s general, and sometimes limited, pre-trained knowledge.

Code Engine and Docling Together

Imagine leveraging the unparalleled document understanding capabilities of Docling with the on-demand, GPU-powered processing of IBM Cloud Code Engine. This combination unlocks a new frontier for AI applications, particularly those focused on knowledge extraction and advanced analytics from vast quantities of unstructured data. Docling efficiently transforms complex documents into structured insights, creating a rich, machine-readable foundation. Now, couple this with Code Engine’s ability to instantly provision GPU-enabled “Serverless Fleets.” This means you can feed Docling’s refined output directly into GPU-accelerated AI models running on Code Engine for tasks like large-scale inferencing, fine-tuning, or even full AI training, all without the overhead of managing specialized hardware. The synergy is profound: Docling provides the high-quality data input, and Code Engine delivers the elastic, GPU-backed compute power to process that data at scale, accelerating time to insight and enabling the creation of highly intelligent, data-driven Generative AI and RAG applications that can truly understand and act upon the vast knowledge contained within your documents.

For those eager to dive deeper and experience this powerful synergy firsthand, a comprehensive tutorial is readily available on the IBM Code Engine public GitHub repository. This invaluable resource provides a clear, step-by-step description that guides users through the process of setting up and utilizing these capabilities. I strongly advise anyone interested in leveraging the combined power of Docling and Code Engine for their AI workloads to explore this tutorial, put their hands-on, and try it on their own to truly grasp its potential.

Example of a “run” configuration 🏃

#!/bin/bash

set -e

uuid=$(uuidgen | tr '[:upper:]' '[:lower:]' | awk -F- '{print $1}')

IMAGE="quay.io/docling-project/docling-serve-cpu"

echo ibmcloud code-engine beta fleet create --name "fleet-${uuid}-1"
echo "  "--image $IMAGE
echo "  "--worker-profile mx3d-24x240
echo "  "--max-scale 8
echo "  "--tasks-from-local-file commands.jsonl
echo "  "--cpu 12
echo "  "--memory 120G
echo "  "--mount-data-store /input=fleet-input-store:/docling
echo "  "--mount-data-store /output=fleet-output-store:/docling

ibmcloud code-engine beta fleet create --name "fleet-${uuid}-1" \
--image $IMAGE \
--worker-profile mx3d-24x240 \
--max-scale 8 \
--tasks-from-local-file commands.jsonl \
--cpu 12 \
--memory 120G \
--tasks-state-store fleet-task-store \
--mount-data-store /input=fleet-input-store:/docling \
--mount-data-store /output=fleet-output-store:/docling

Example of a “run with GPU” configuration 🏭

#!/bin/bash

set -e

uuid=$(uuidgen | tr '[:upper:]' '[:lower:]' | awk -F- '{print $1}')

# https://github.com/docling-project/docling-serve?tab=readme-ov-file#container-images
IMAGE="quay.io/docling-project/docling-serve"

echo ibmcloud code-engine beta fleet create --name "fleet-${uuid}-1"
echo "  "--image $IMAGE
echo "  "--max-scale 1
echo "  "--tasks-from-local-file commands.jsonl
echo "  "--gpu l40s
echo "  "--mount-data-store /input=fleet-input-store:/docling
echo "  "--mount-data-store /output=fleet-output-store:/docling

ibmcloud code-engine beta fleet create --name "fleet-${uuid}-1" \
--image $IMAGE \
--max-scale 1 \
--tasks-from-local-file commands.jsonl \
--gpu l40s \
--tasks-state-store fleet-task-store \
--mount-data-store /input=fleet-input-store:/docling \
--mount-data-store /output=fleet-output-store:/docling

Conclusion

In conclusion, we’ve explored the transformative potential unlocked by combining serverless computing’s agility, embodied by IBM Cloud Code Engine, with the intelligent document processing prowess of Docling. From Code Engine’s “Serverless Fleets” offering on-demand, GPU-accelerated compute for even the most intensive AI tasks, to Docling’s ability to transform unstructured data into a high-quality fuel for Generative AI and RAG systems, the synergy is undeniable. This powerful pairing empowers developers and data scientists to build and deploy intelligent applications faster, more efficiently, and with greater accuracy, ultimately bridging the gap between raw data and actionable AI-driven insights. For those ready to experience this innovation, the detailed tutorial on the IBM Code Engine GitHub repository provides the perfect starting point to put these concepts into practice.