DEV Community

Alain Airom
Alain Airom

Posted on

☕ Seamless AI Document Processing: Integrating Docling into Java with the Arconia-io Framework

Testing arconia-io and provided Java framework for Docling

Introduction

What/Who is arconia-io?

Arconia.io (https://arconia.io/) provides “frameworks and tools for building cloud native software systems in Java with a superior developer experience”.

I ran into their GitHub Docling repository a while ago and as usual was curious to test it on my own.

Tests and Implementation

I am not really fluent with Java, so it took me a long time to be able to run the test application provided with few modifications.

But the fact that Arconia and also Docling itself propose a Java implementation, is a a great opportunity for the developers with strong Java skills and whom might (probably) be less Python oriented.

The Java implementation offered by the Arconia framework and the dedicated Docling Java client provides a compelling strategic advantage. This first-class support directly addresses the needs of Java-centric development teams, offering a native, familiar entry point that eliminates the friction and complexity typically associated with integrating external, Python-based AI services. For developers whose core expertise lies in the Java ecosystem, this is a significant opportunity to seamlessly leverage powerful AI document conversion without needing to manage cross-language dependencies.

Essential Components

So let’s jump into the pre-requisites to run Arconia’s Docling wrapper in Java. The components you need are;

  • Podman or Docker: essential for running the Docling AI service via Testcontainers.
  • JDK (java development kit): Required for compiling and running the Java application (e.g., JDK 21+).
  • JRE (Java Runtime Environment): The standard environment used to execute the compiled Java application.

🛠️ Phase 1: Project Setup and Code Implementation

  • Lauch your Podman or Docker machine (I use Podman Desktop and engine)
  • Prepare your java application folder, mine is ⤵️
/Users/xxx/Devs/arconia-docling
Enter fullscreen mode Exit fullscreen mode
  • And then almost like the repo;

  • You’ll need a “build.gradle” file

plugins {
 id 'java'
 id 'org.springframework.boot' version '3.5.7'
 id 'io.spring.dependency-management' version '1.1.7'
 id 'org.graalvm.buildtools.native' version '0.10.6'
}

group = 'io.arconia'
version = '0.0.1-SNAPSHOT'

java {
 toolchain {
  languageVersion = JavaLanguageVersion.of(25)
 }
}

repositories {
 mavenCentral()
}

dependencies {
 implementation "io.arconia:arconia-docling-spring-boot-starter"

    implementation 'org.springframework.boot:spring-boot-starter-actuator'
 implementation 'org.springframework.boot:spring-boot-starter-web'

 developmentOnly 'org.springframework.boot:spring-boot-devtools'
 testAndDevelopmentOnly 'io.arconia:arconia-dev-services-docling'

 testImplementation 'org.springframework.boot:spring-boot-starter-test'
 testRuntimeOnly 'org.junit.platform:junit-platform-launcher'
}

dependencyManagement {
 imports {
  mavenBom "io.arconia:arconia-bom:0.17.1"
 }
}

tasks.withType(JavaCompile).configureEach {
    options.compilerArgs.add("-parameters")
}

tasks.named('test') {
 useJUnitPlatform()
}

tasks.named('bootBuildImage') {
 builder = "paketobuildpacks/builder-noble-java-tiny"
}

springBoot {
 buildInfo {
  excludes = ['time']
 }
}
Enter fullscreen mode Exit fullscreen mode
  • The code with almost no changes…
package io.arconia.demo;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.web.bind.annotation.RestController;

import io.arconia.docling.client.DoclingClient;
import io.arconia.docling.client.convert.request.ConvertDocumentRequest;
import io.arconia.docling.client.convert.response.ConvertDocumentResponse;

import org.springframework.web.bind.annotation.GetMapping; // Changed back to GET
import org.springframework.web.bind.annotation.RequestParam;

@SpringBootApplication
public class DoclingApplication {

 public static void main(String[] args) {
  SpringApplication.run(DoclingApplication.class, args);
 }

}

@RestController
class DoclingController {

 private final DoclingClient doclingClient;

 public DoclingController(DoclingClient doclingClient) {
  this.doclingClient = doclingClient;
 }

 @GetMapping("/convert") // Handles GET requests
 public String convertDocument(@RequestParam("url") String url) {
  ConvertDocumentResponse response = doclingClient
   .convertSource(ConvertDocumentRequest.builder()
    .addHttpSources(url) // Uses the working URL method
    .build());
  return response.document().markdownContent();
 }

}
Enter fullscreen mode Exit fullscreen mode

⚙️ Phase 2: Environment and Dependency Resolution

  • Configure/Set Docker Host
export DOCKER_HOST="unix://${PODMAN_SOCKET}"
# optional
export TESTCONTAINERS_RYUK_DISABLED=true
Enter fullscreen mode Exit fullscreen mode

🏃‍♂️‍➡️ Phase 3: Run The Code

To run the application I used 3 terminal tabs using my VSCode IDE.

  • In the first tab I lauch a Python server. A Python Server (8000) is acting as the source of the document (a simple HTTP file server). It hosts my example document to be converrted which is the watsonxdata-developer.pdf file.
# first tab - lauch a web server so that your app can call it
python3 -m http.server 8000
Enter fullscreen mode Exit fullscreen mode

  • Be sure that the Docling Container service runs because it provides the core conversion logic. It is what actually converts the PDF to Markdown.

  • In the 2nd tab the Spring Boot App (8080) as the client and orchestrator. It receives your curl request, tells the Docling Container "Go get the file at http://localhost:8000/input/watsonxdata-developer.pdf and convert it," and then returns the result.
# as I did several tests... I had to run the first and second line but eventually the 
# 3rd line is the only command which actually is the one you should run!
./gradlew clean
./gradlew build -x test

######
./gradlew bootRun --rerun-tasks
Enter fullscreen mode Exit fullscreen mode
  • And finlally the 3rd tab which runs the conversion!
curl "http://localhost:8080/convert?url=http://localhost:8000/input/watsonxdata-developer.pdf"

Enter fullscreen mode Exit fullscreen mode
  • … and the long long result using the magic of Docling (excerpt shown only!) 🎰


curl "http://localhost:8080/convert?url=http://host.docker.internal:8000/input/watsonxdata-developer.pdf"
## IBM watsonx.data developer edition
...
...
## Notices

This information was developed for products and services offered in the US. This material might be available from IBM in other languages. However, you may be required to own a copy of the product or product version in that language in order to access it.

IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service.

IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to:

IBM Director of Licensing IBM Corporation North Castle Drive, MD-NC119 Armonk, NY 10504-1785 U
...
## Contents

| Chapter 1. Developer edition (New version)............................................................1                                                            |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|
...
...
## Creating ingestion job


.\cpdctl.exe wx-data ingestion create ` --source-data-files s3://bucketcos/shibil.csv ` --engine-id spark804 ` --driver-cores 0 ` --driver-memory "0G" ` --executor-cores 2 ` --executor-memory "4G" ` --num-executors 1 ` --target-table iceberg_data.cpdctl_test_schema.table1 --sync-status

## IBM®%
Enter fullscreen mode Exit fullscreen mode

🔣 Optional Phase: Troubleshooting

If you run into the fact that the app cannot run or you wail more than 10 or 15 minutes to pull the image then..

2025-10-28T14:58:19.417+01:00  INFO 63844 --- [docling] [  restartedMain] DoclingContainerConnectionDetailsFactory : Docling UI: http://localhost:38633/ui
2025-10-28T14:58:19.606+01:00  INFO 63844 --- [docling] [  restartedMain] o.s.b.d.a.OptionalLiveReloadServer       : LiveReload server is running on port 35729
2025-10-28T14:58:19.610+01:00  INFO 63844 --- [docling] [  restartedMain] o.s.b.a.e.web.EndpointLinksResolver      : Exposing 13 endpoints beneath base path '/actuator'
2025-10-28T14:58:19.640+01:00  INFO 63844 --- [docling] [  restartedMain] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port 8080 (http) with context path '/'
2025-10-28T14:58:19.651+01:00  INFO 63844 --- [docling] [  restartedMain] io.arconia.demo.DoclingApplication       : Started DoclingApplication in 5.671 seconds (process running for 5.841)
2025-10-28T14:59:17.568+01:00  INFO 63844 --- [docling] [nio-8080-exec-1] o.a.c.c.C.[Tomcat].[localhost].[/]       : Initializing Spring DispatcherServlet 'dispatcherServlet'
2025-10-28T14:59:17.569+01:00  INFO 63844 --- [docling] [nio-8080-exec-1] o.s.web.servlet.DispatcherServlet        : Initializing Servlet 'dispatcherServlet'
2025-10-28T14:59:17.569+01:00  INFO 63844 --- [docling] [nio-8080-exec-1] o.s.web.servlet.DispatcherServlet        : Completed initialization in 0 ms
<===========--> 90% EXECUTING [20m 13s]
> :bootRun
Enter fullscreen mode Exit fullscreen mode
podman machine stop
podman machine set --memory 6000
podman machine start
podman ps -a
###
CONTAINER ID  IMAGE                                          COMMAND               CREATED         STATUS                    PORTS                                                                     NAMES
c5c29daed898  docker.io/opensearchproject/opensearch:2.19.3  opensearch            8 days ago      Exited (0) 292 years ago  0.0.0.0:9200->9200/tcp, 0.0.0.0:9600->9600/tcp, 9300/tcp, 9650/tcp        opensearch-node
d944a7423eaa  docker.io/milvusdb/milvus:v2.5.14              milvus run standa...  4 days ago      Exited (134) 4 days ago   0.0.0.0:2379->2379/tcp, 0.0.0.0:9091->9091/tcp, 0.0.0.0:19530->19530/tcp  milvus-standalone
f641f290eb3e  ghcr.io/docling-project/docling-serve:v1.5.1   docling-serve run     14 minutes ago  Up 14 minutes             0.0.0.0:36335->5001/tcp, 8080/tcp   
###
podman logs xxxxxx # ---> debugging the image...
Enter fullscreen mode Exit fullscreen mode

**> The complete pipeline of your final working solution is:

Terminal 3 (cURL)→Spring Boot App (8080) → Docling Container (Docker) → Python Server (8000) →PDF Data → Docling Container → Markdown Data → Spring Boot App→ Terminal 3 (Result)**

Conclusion

This extensive process successfully culminated in the creation of a seamless, end-to-end document intelligence pipeline. The conclusive functionality achieved is the ability for a single Spring Boot application, utilizing the Arconia Java framework, to orchestrate the complex task of converting a raw PDF document into ready-to-use structured Markdown data. This was accomplished by successfully launching, communicating with, and managing the external, intelligent Docling service running within a container environment (Podman/Docker), proving that Java developers can fully control powerful, containerized AI-related services from their native ecosystem.

Links

Top comments (0)