Hancom unveiled 'OpenDataLoader PDF v2.0,' which has achieved the No. 1 benchmark performance in the open-source PDF data extraction category. The benchmark test data and detailed reproducible code are available in the official GitHub repository, underscoring Hancom's commitment to transparency — a foundational principle of open source.
This version features a hybrid engine that combines AI-based and direct extraction methods. As a result, businesses and developers can leverage high-performance PDF data extraction capabilities free of charge in a fully airgapped local environment, eliminating any risk of data leakage to external servers.
In addition, this version comes equipped with four free AI add-ons designed to extract complex elements within documents. 'Optical Character Recognition (OCR)' enhances text recognition accuracy for image-based PDFs and scanned documents. 'Table Extraction' leverages an ultra-lightweight AI model to precisely analyze complex table structures, including merged cells. 'Formula Extraction' recognizes complex mathematical and scientific equations from academic papers in a local environment. 'Chart Analysis' interprets the contextual meaning of charts and delivers the insights as human-readable narrative.
These four add-ons have been implemented to ensure compatibility with third-party open-source AI models such as Docling. Although there is no official partnership or sponsorship with any specific entity, objective technical compatibility has been secured so that users can seamlessly integrate the addons within their existing technology environments. The flexible add-on architecture also allows for the integration of additional AI models in the future.
With this release, the open-source license **has been updated from MPL 2.0 (Mozilla Public License 2.0) to Apache 2.0 (Apache License 2.0). **By adopting one of the most permissive commercial licenses available, Hancom has significantly reduced friction for external developers and global enterprises looking to build on the platform. This is expected to foster the growth of a diverse business model ecosystem including WebApps and SaaS solutions — built on OpenDataLoader PDF.
Hancom is also pursuing ecosystem expansion in line with the era of autonomous AI agents. Having completed LangChain integration in 2025, the company will broaden its connectivity with a range of AI frameworks in 2026, including Langflow, LlamaIndex, and Gemini-cli. Hancom is also preparing MCP (Model Context Protocol) functionality to support AI agents.
**In the second half of 2026, Hancom plans to release a commercial AI add-on that consolidates its proprietary document AI technologies. **Furthermore, the company aims to be the first open-source solution to automatically generate accessibility tags through AI-driven document structure analysis. With the enforcement of the European Accessibility Act (EAA) and the strengthening of Korea's Act on the Prohibition of Discrimination Against Persons with Disabilities, companies worldwide are facing growing pressure to ensure compliance with digital document accessibility standards. Against this backdrop, Hancom plans to expand into a PDF AI accessibility solution that meets the global accessibility standard (PDF/UA) and establish a new open-source-based business model.
Jeong Ji-hwan, Chief Technology Officer (CTO) of Hancom, stated, "OpenDataLoader PDF v2.0 has evolved into an open PDF data platform that anyone can freely utilize and build upon, made possible through the AI hybrid engine and the transition to the Apache 2.0 license." He added, "Going forward, through our commercial AI add-ons and accessibility solutions, we will not only enable PDF documents worldwide to be leveraged by AI, but also shape a global ecosystem where documents are truly accessible to all."

Top comments (0)