DEV Community

Cover image for HANCOM open-sources AI auto-tagging in OpenDataLoader PDF
Julia
Julia

Posted on

HANCOM open-sources AI auto-tagging in OpenDataLoader PDF

HANCOM has open-sourced an AI auto-tagging feature in OpenDataLoader PDF that automatically writes accessibility tags directly into existing PDF documents, running on-premise with no per-page or per-document limits.
HANCOM has open-sourced an AI auto-tagging feature that automatically writes accessibility tags into PDF documents. The capability ships inside OpenDataLoader PDF and is released globally as open source, with Python, Node.js and Java libraries — distributed via GitHub, PyPI (opendataloader-pdf), npm (@opendataloader/pdf) and Maven Central (org.opendataloader:opendataloader-pdf-core) — alongside a command-line tool for developers worldwide. The release was announced on 30 April 2026.

How auto-tagging works
AI analyzes a document‘s structure and writes the results directly inside the original PDF file. It distinguishes components such as titles, tables, lists and images, then reflects them inside the PDF as tags that carry the accessibility structure. The auto-tagging output is written back into the actual PDF in a complete form — and this end-to-end stage is included in the free, open-source release.

Why PDF accessibility matters
PDF is one of the most widely used digital document formats worldwide, yet a large share of documents have circulated without accessibility tags. When tags are missing, screen readers cannot properly recognize document structure, making it difficult for people with visual impairments and other groups with limited access to information to understand the content.

Global regulatory backdrop
Demand is expanding quickly in step with regulatory changes across multiple jurisdictions. In the United States, the main obligations under ADA (Americans with Disabilities Act) Title II begin to apply in April 2026. In Europe, the EAA (European Accessibility Act) is taking effect in parallel. In Asia, Korea‘s Act on the Prohibition of Discrimination Against Persons with Disabilities is aligning with the same trajectory. Together, these regimes are pushing enterprises and public institutions worldwide to remediate their PDF archives at scale.

How it compares to existing offerings
In the global market, free tiers for cloud-API offerings have typically been limited to dozens of pages per month, and full-scale adoption has incurred annual corporate license costs in the tens of thousands of dollars. Some desktop products insert watermarks in outputs during free trials, or restrict key features behind separate paid tiers.

OpenDataLoader PDF, by contrast, can be used without limits on the number of documents. It is processed in an on-premise environment, so sensitive documents are not sent to external servers — an important property for organizations operating under data-residency regimes worldwide. Python, Node.js and Java libraries, as well as a command-line tool, are provided to integrate with existing workflows.

Standards alignment and collaboration
The open-source auto-tagging engine generates tag structures that reference PDF Association technical specifications and align with the PDF/UA (PDF Universal Accessibility) international standard. Full PDF/UA-compliant output is being developed for the upcoming commercial solution. HANCOM is enhancing its quality verification system in collaboration with Dual Lab, the team behind the open-source PDF accessibility validation tool veraPDF.

Free open-source core, paid PDF/UA-compliant commercial tier
HANCOM is pursuing this release as part of a document AI platform strategy that goes beyond document processing tools to encompass accessibility readiness and regulatory compliance. The split is explicit:

Free, open source: the AI auto-tagging core in OpenDataLoader PDF, with no document or page limits, available to developers and organizations worldwide.
Paid commercial solution (Q2 2026): a separate offering that outputs results compliant with the PDF/UA international standard, targeted at enterprises and public institutions that need to respond to audits and comply with regulations.
About HANCOM

HANCOM is a document software company headquartered in the Republic of Korea, contributing to the global document AI and PDF ecosystem through open-source releases, international standards participation, and partnerships with members of the PDF Association.

_“HANCOM aims to open-source core features so anyone can start accessibility conversion without expense burdens. For corporations that need to convert large volumes of documents, we will provide free core tools alongside commercial solutions compliant with PDF/UA.”
_ Jung Ji-hwan, Chief Technology Officer, HANCOM

Top comments (0)