<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Julia</title>
    <description>The latest articles on DEV Community by Julia (@katash).</description>
    <link>https://dev.to/katash</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3647888%2F1438cec9-6a18-460d-ae5f-d68ccd021403.jpg</url>
      <title>DEV Community: Julia</title>
      <link>https://dev.to/katash</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/katash"/>
    <language>en</language>
    <item>
      <title>New Release: PDF4WCAG 1.8 Accessibility Checker</title>
      <dc:creator>Julia</dc:creator>
      <pubDate>Fri, 17 Apr 2026 12:36:47 +0000</pubDate>
      <link>https://dev.to/katash/new-release-pdf4wcag-18-accessibility-checker-49h1</link>
      <guid>https://dev.to/katash/new-release-pdf4wcag-18-accessibility-checker-49h1</guid>
      <description>&lt;p&gt;&lt;a href="https://duallab.com/" rel="noopener noreferrer"&gt;Dual Lab&lt;/a&gt; team is ready to announce a new update 1.8 to &lt;a href="http://www.pdf4wcag.com/blog-news/new-release-pdf4wcag-1-8-accessibility-checker" rel="noopener noreferrer"&gt;PDF4WCAG&lt;/a&gt;, delivering further improvements in validation accuracy, user experience, and overall stability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Improved Accuracy&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fixes in PDF/UA validation&lt;/strong&gt; to align with latest technical discussions within TWGs of PDF Association and &lt;a href="https://verapdf.org/" rel="noopener noreferrer"&gt;veraPDF&lt;/a&gt; improvements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;permit Math to be not necessarily an immediate child of Formula structure element;&lt;/li&gt;
&lt;li&gt;improve glyph name calculation for &lt;strong&gt;Type1&lt;/strong&gt; and &lt;strong&gt;TrueType fonts&lt;/strong&gt;;&lt;/li&gt;
&lt;li&gt;adjusted validation of the &lt;strong&gt;PDF Table structure element&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Missing translations of error messages&lt;/strong&gt; have also been added to improve clarity across languages (Dutch, German, English).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enhanced User Experience&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Error preview filters&lt;/strong&gt; have been reworked for more convenient error inspection.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fylpuceffg6ba8ps5b1tu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fylpuceffg6ba8ps5b1tu.png" alt=" " width="518" height="586"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Export Validation Results:&lt;/strong&gt; users can export validation results as PDF for client reporting, documentation or internal audits purposes. Just click on the Export results on the Summary page.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fakp51omrekhzyzi6v3il.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fakp51omrekhzyzi6v3il.png" alt=" " width="800" height="425"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgmh868xil117viiohnln.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgmh868xil117viiohnln.png" alt=" " width="800" height="1097"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One-Click Refresh:&lt;/strong&gt; users can reupload and repeat the analysis of the document in one click (Web) or just via Refresh button in the Desktop version.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub and collaboration:&lt;/strong&gt; PDF4WCAG now includes a direct link to its &lt;a href="https://github.com/duallab/PDF4WCAG-public/issues" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt; within the feedback popup, inviting developers and users to contribute to the tool's roadmap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The ability to use PDF4WCAG command line&lt;/strong&gt; in the console (paid subscription).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Commercial use of PDF4WCAG:&lt;/strong&gt; the &lt;a href="http://www.pdf4wcag.com/licensing/" rel="noopener noreferrer"&gt;commercial use of Desktop&lt;/a&gt; version and CLI automation is available in the annual subscription for just 299 EUR / 359 USD (excl. taxes).&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This release 1.8 reflects our ongoing commitment to providing precise, standards-aligned accessibility validation and a smoother user experience for organizations working toward WCAG and PDF/UA compliance.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Roadmap Update&lt;/strong&gt;&lt;br&gt;
We're excited to announce the start of beta testing for the &lt;strong&gt;PDF4WAG Integration API.&lt;/strong&gt; If you're interested in participating as a beta tester, please send us your request to &lt;a href="mailto:info@pdf4wcag.com"&gt;info@pdf4wcag.com&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>pdf</category>
      <category>ai</category>
      <category>webdev</category>
      <category>productivity</category>
    </item>
    <item>
      <title>License change (Apache 2.0): Brand image enhancement through tech openness</title>
      <dc:creator>Julia</dc:creator>
      <pubDate>Fri, 17 Apr 2026 12:25:43 +0000</pubDate>
      <link>https://dev.to/katash/license-change-apache-20-brand-image-enhancement-through-tech-openness-48e7</link>
      <guid>https://dev.to/katash/license-change-apache-20-brand-image-enhancement-through-tech-openness-48e7</guid>
      <description>&lt;p&gt;OpenDataLoader PDF has officially moved from MPL-2.0 to Apache License 2.0. This change removes adoption friction for enterprise integrations, provides explicit patent protection, and signals long-term commitment to transparency. Apache 2.0 is the most widely adopted permissive license among enterprise-grade open-source projects.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>development</category>
    </item>
    <item>
      <title>License change (Apache 2.0): Brand image enhancement through tech openness</title>
      <dc:creator>Julia</dc:creator>
      <pubDate>Thu, 09 Apr 2026 12:12:14 +0000</pubDate>
      <link>https://dev.to/katash/license-change-apache-20-brand-image-enhancement-through-tech-openness-10ke</link>
      <guid>https://dev.to/katash/license-change-apache-20-brand-image-enhancement-through-tech-openness-10ke</guid>
      <description>&lt;p&gt;&lt;a href="https://github.com/opendataloader-project/opendataloader-pdf" rel="noopener noreferrer"&gt;OpenDataLoader PDF&lt;/a&gt; has officially moved from &lt;strong&gt;MPL-2.0&lt;/strong&gt; to &lt;strong&gt;Apache License 2.0.&lt;/strong&gt; This change removes adoption friction for enterprise integrations, provides explicit patent protection, and signals long-term commitment to transparency. Apache 2.0 is the most widely adopted permissive license among enterprise-grade open-source projects.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnbvcpwz4wm9atgqlz5mx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnbvcpwz4wm9atgqlz5mx.png" alt=" " width="800" height="269"&gt;&lt;/a&gt;&lt;br&gt;
With over &lt;a href="https://github.com/opendataloader-project/opendataloader-pdf" rel="noopener noreferrer"&gt;13,000 GitHub stars&lt;/a&gt; and growing, OpenDataLoader PDF has become one of the most recognized open-source PDF processing tools in the developer community. The move to Apache 2.0 reflects this momentum making it easier for the next 10,000 contributors and adopters to join.&lt;br&gt;
&lt;strong&gt;Apache License 2.0&lt;/strong&gt; has officially been adopted for OpenDataLoader PDF converter as a strategic decision that reflects the long-term vision for transparency, innovation, and ecosystem growth.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Initially ODL used the MPL-2.0 (Mozilla Public License 2.0) license.&lt;br&gt;
The license change is not just a legal update. It is a conscious move to strengthen the brand through technological openness.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;By adopting one of the most permissive commercial licenses available, Hancom has significantly reduced friction for external developers and global enterprises looking to build on the platform. This is expected to foster the growth of a diverse business model ecosystem including WebApps and SaaS solutions built on #OpenDataLoader PDF.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Comparative table of Apache License 2.0 MIT License&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsq0g4uyc7ni3d4kn08gf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsq0g4uyc7ni3d4kn08gf.png" alt=" " width="800" height="340"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The Apache License 2.0 provides a strong and permissive framework that has significantly influenced the evolution of open-source software. Its main advantages are legal clarity, flexibility, and support for dual licensing making it well suited for a wide range of projects, from big data platforms to modern web technologies such as OpenDataLoader.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Principles of community trust and transparency&lt;/strong&gt;&lt;br&gt;
Making a comparative analysis of products related to PDF documents-processing technologies, ODL team has concluded that the majority are distributed under restrictive or proprietary licenses. By choosing &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Apache 2.0, OpenDataLoader sends a clear and open message to partners and clients:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Our technology is open.&lt;/li&gt;
&lt;li&gt;Our roadmap is transparent.&lt;/li&gt;
&lt;li&gt;Our community is welcome.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Apache 2.0 is widely recognized as a permissive, business-friendly open-source license. It allows commercial use, modification and integration into proprietary systems. These factors lower adoption barriers and build confidence among users. At the same time, Apache 2.0 preserves intellectual clarity and patent protection, providing legal safety for contributors.&lt;/p&gt;

&lt;p&gt;In modern software markets brand trust is built on transparency and collaboration. Open-source licensing is no longer just a development model, it is a brand statement.&lt;/p&gt;

&lt;p&gt;Openness strengthens credibility. Credibility strengthens adoption. Adoption strengthens the brand.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Driving ecosystem growth&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Openness speeds up innovation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;By choosing Apache 2.0 for OpenDataLoader, the team encourages:&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Community contributions:&lt;/em&gt; you can create an issue in the &lt;a href="https://dev.toopendataloader-project/opendataloader-pdf"&gt;GitHub Issues&lt;/a&gt; · opendataloader-project/opendataloader-pdf&lt;br&gt;
&lt;em&gt;Benchmark transparency&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This creates a stronger technical ecosystem around OpenDataLoader.&lt;br&gt;
By removing licensing barriers, OpenDataLoader enables broader integration and faster innovation.&lt;br&gt;
&lt;strong&gt;Open technology builds stronger ecosystems and stronger brands.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Frequently Asked Questions&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Q: &lt;strong&gt;Why did OpenDataLoader switch from MPL-2.0 to Apache 2.0?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: &lt;em&gt;MPL-2.0's file-level copyleft requirement created integration friction for enterprise users combining OpenDataLoader with proprietary systems. Apache 2.0 removes this barrier while still providing contributor protections and explicit patent grants.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Q: &lt;strong&gt;Does this license change affect existing users?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: &lt;em&gt;No. Apache 2.0 is more permissive than MPL-2.0, so all existing use cases remain fully supported with fewer restrictions.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Q: &lt;strong&gt;Can I use OpenDataLoader PDF in a commercial product?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: &lt;em&gt;Yes. Apache 2.0 explicitly allows commercial use, modification, and redistribution. You only need to include the license notice and state any changes made.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Q: &lt;strong&gt;How does Apache 2.0 compare to MIT for enterprise adoption?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: &lt;em&gt;Both are permissive, but Apache 2.0 adds an explicit patent grant and contributor license agreement critical protections for enterprise legal teams evaluating open-source dependencies.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Q: &lt;strong&gt;How can I contribute to OpenDataLoader?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: &lt;em&gt;You can open issues or submit pull requests on GitHub (opendataloader-project/opendataloader-pdf). Community contributions are welcome under the Apache 2.0 CLA.&lt;/em&gt;&lt;br&gt;
Homepage GitHub&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Homepage:&lt;/strong&gt; &lt;a href="https://opendataloader.org?utm_source=medium&amp;amp;utm_medium=blog&amp;amp;utm_campaign=apache2_license_change" rel="noopener noreferrer"&gt;https://opendataloader.org?utm_source=medium&amp;amp;utm_medium=blog&amp;amp;utm_campaign=apache2_license_change&lt;/a&gt;&lt;br&gt;
**GitHub: **&lt;a href="https://github.com/opendataloader-project/opendataloader-pdf?utm_source=medium&amp;amp;utm_medium=blog&amp;amp;utm_campaign=apache2_license_change" rel="noopener noreferrer"&gt;https://github.com/opendataloader-project/opendataloader-pdf?utm_source=medium&amp;amp;utm_medium=blog&amp;amp;utm_campaign=apache2_license_change&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>development</category>
    </item>
    <item>
      <title>OpenDataLoader: THE #1 OPEN SOURCE PARSER IN TRANSPARENT BENCHMARKS</title>
      <dc:creator>Julia</dc:creator>
      <pubDate>Thu, 09 Apr 2026 09:08:57 +0000</pubDate>
      <link>https://dev.to/katash/opendataloader-the-1-open-source-parser-in-real-benchmarks-17kk</link>
      <guid>https://dev.to/katash/opendataloader-the-1-open-source-parser-in-real-benchmarks-17kk</guid>
      <description>&lt;p&gt;&lt;strong&gt;OpenDataLoader&lt;/strong&gt; team  published the full benchmark results on &lt;a href="http://opendataloader.org" rel="noopener noreferrer"&gt;http://opendataloader.org&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvdldz8dyxw3zcpsj38ll.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvdldz8dyxw3zcpsj38ll.jpg" alt=" " width="800" height="677"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Transparent methodology, 200 real-world PDFs, all scores reproducible.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenDataLoader PDF offers two modes!&lt;/strong&gt;&lt;br&gt;
⚙️ &lt;strong&gt;Rule-based mode&lt;/strong&gt;&lt;br&gt;
No AI model. Runs locally, no GPU required. 0.015s/page — the fastest in benchmarks.&lt;br&gt;
🧠 &lt;strong&gt;Hybrid mode&lt;/strong&gt;&lt;br&gt;
Rule-based engine + AI model combined. Significant quality improvements in tables, reading order, and image recognition.&lt;br&gt;
&lt;strong&gt;Hybrid mode results&lt;/strong&gt;&lt;br&gt;
📊 Overall: 0.907 (#1)&lt;br&gt;
📖 Reading Order: 0.934 (#1)&lt;br&gt;
📋 Table Extraction: 0.928 (#1)&lt;br&gt;
⚡ Speed (rule-based mode): 0.015s/page (#1)&lt;br&gt;
🏷️ Heading Detection: 0.821 (#2)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key highlights&lt;/strong&gt;&lt;br&gt;
📋 Table extraction #1 (0.928) — 0.041 gap over 2nd place.&lt;br&gt;
Table structure drives answer quality in RAG pipelines. This gap matters.&lt;br&gt;
📖 Reading order #1 (0.934).&lt;br&gt;
Multi-column layouts are extracted in the order humans actually read.&lt;br&gt;
⚡ &lt;strong&gt;Speed and quality at the same time.&lt;/strong&gt;&lt;br&gt;
Rule-based mode for speed, hybrid mode for accuracy.&lt;br&gt;
Choose based on your use case.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Compared against 12 parsers, including docling, marker, unstructured, mineru, and pymupdf4llm.&lt;/strong&gt;&lt;br&gt;
All results are per-document mean — no cherry-picking, no synthetic data.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;The benchmark repo is open.&lt;/strong&gt;&lt;br&gt;
Run it yourself, add your own parser.&lt;/p&gt;

&lt;p&gt;🔗 &lt;strong&gt;&lt;a href="https://opendataloader.org/?utm_source=x&amp;amp;&amp;lt;br&amp;gt;%0Autm_medium=social&amp;amp;utm_campaign=benchmark_release" rel="noopener noreferrer"&gt;Benchmark&lt;/a&gt;&lt;/strong&gt; → &lt;a href="https://opendataloader.org/?utm_source=x&amp;amp;" rel="noopener noreferrer"&gt;https://opendataloader.org/?utm_source=x&amp;amp;&lt;/a&gt;&lt;br&gt;
utm_medium=social&amp;amp;utm_campaign=benchmark_release&lt;/p&gt;

&lt;p&gt;📂 &lt;strong&gt;&lt;a href="https://github.com/opendataloader&amp;lt;br&amp;gt;%0A-project/opendataloader-bench?utm_source=x&amp;amp;utm_medium=social&amp;amp;utm_campaign=benchmark_release" rel="noopener noreferrer"&gt;Methodology&lt;/a&gt;&lt;/strong&gt; → &lt;a href="https://github.com/opendataloader" rel="noopener noreferrer"&gt;https://github.com/opendataloader&lt;/a&gt;&lt;br&gt;
-project/opendataloader-bench?utm_source=x&amp;amp;utm_medium=social&amp;amp;utm_campaign=benchmark_release&lt;/p&gt;

&lt;p&gt;⭐ &lt;strong&gt;&lt;a href="https://github.com/opendataloader" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/strong&gt; → &lt;a href="https://github.com/opendataloader" rel="noopener noreferrer"&gt;https://github.com/opendataloader&lt;/a&gt;&lt;/p&gt;

</description>
      <category>development</category>
      <category>opensource</category>
      <category>ai</category>
      <category>programming</category>
    </item>
    <item>
      <title>The fastest non-VLM parser that preserves document structure: tables, headings, lists is OpenDataLoader PDF.</title>
      <dc:creator>Julia</dc:creator>
      <pubDate>Wed, 01 Apr 2026 11:30:45 +0000</pubDate>
      <link>https://dev.to/katash/the-fastest-non-vlm-parser-that-preserves-document-structure-tables-headings-lists-is-2opk</link>
      <guid>https://dev.to/katash/the-fastest-non-vlm-parser-that-preserves-document-structure-tables-headings-lists-is-2opk</guid>
      <description>&lt;p&gt;🚀 The developers found room to improve on latency, so we profiled. We initially expected the sorting algorithm &lt;strong&gt;(XY-Cut++)&lt;/strong&gt; to be the bottleneck, but it turned out to be less than **1% **of the total time. The real cost was hiding in content filtering (55%) and preprocessing (25%).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr4onpaz8frmx0idprwfr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr4onpaz8frmx0idprwfr.png" alt="Benchmarks" width="800" height="348"&gt;&lt;/a&gt;&lt;br&gt;
🖇️&lt;strong&gt;3 fixes applied&lt;/strong&gt;&lt;br&gt;
💥Page-level parallel processing&lt;br&gt;
💥Hidden text detection → opt-in&lt;br&gt;
💥Text-only fast path&lt;br&gt;
💢Output is byte-for-byte identical before and after optimization. Only the speed changed results stay the same.&lt;/p&gt;

&lt;p&gt;🖇️&lt;strong&gt;OpenDataLoader PDF highlights&lt;/strong&gt;&lt;br&gt;
🚀#1 in latency 🥇(585 pages in 1.10s)&lt;br&gt;
🗃️#1 in memory efficiency 🥇(7.4MB)&lt;br&gt;
💢Java · Python · Node.js SDK&lt;br&gt;
💢Multiple output formats (text, markdown, HTML, JSON, PDF)&lt;/p&gt;

&lt;p&gt;Check out the benchmark below for latency and memory usage results. See the PR for full details on what changed and how we got here. We'd love your feedback if you try it out!&lt;/p&gt;




&lt;p&gt;GitHub: &lt;a href="http://github.com/opendataloader-project/opendataloader-pdf?utm_source=x&amp;amp;utm_medium=social&amp;amp;utm_campaign=perf_update" rel="noopener noreferrer"&gt;http://github.com/opendataloader-project/opendataloader-pdf?utm_source=x&amp;amp;utm_medium=social&amp;amp;utm_campaign=perf_update&lt;/a&gt;&lt;br&gt;
Benchmark: &lt;a href="http://github.com/opendataloader-project/opendataloader-bench?utm_source=x&amp;amp;utm_medium=social&amp;amp;utm_campaign=perf_update" rel="noopener noreferrer"&gt;http://github.com/opendataloader-project/opendataloader-bench?utm_source=x&amp;amp;utm_medium=social&amp;amp;utm_campaign=perf_update&lt;/a&gt;&lt;br&gt;
PR: &lt;a href="https://github.com/opendataloader-project/opendataloader-pdf/pull/362?utm_source=x&amp;amp;utm_medium=social&amp;amp;utm_campaign=perf_update" rel="noopener noreferrer"&gt;https://github.com/opendataloader-project/opendataloader-pdf/pull/362?utm_source=x&amp;amp;utm_medium=social&amp;amp;utm_campaign=perf_update&lt;/a&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Dual Lab launches reports on PDF Accessibility Trends</title>
      <dc:creator>Julia</dc:creator>
      <pubDate>Thu, 26 Mar 2026 12:03:38 +0000</pubDate>
      <link>https://dev.to/katash/dual-lab-launches-reports-on-pdf-accessibility-trends-3h7f</link>
      <guid>https://dev.to/katash/dual-lab-launches-reports-on-pdf-accessibility-trends-3h7f</guid>
      <description>&lt;p&gt;&lt;strong&gt;&lt;a href="https://duallab.com/dual-lab-launches-quarterly-reports-on-pdf-accessibility-trends-based-on-common-crawl-data/" rel="noopener noreferrer"&gt;Dual Lab&lt;/a&gt; Launches Quarterly Reports on PDF Accessibility Trends based on &lt;a href="https://commoncrawl.org/" rel="noopener noreferrer"&gt;Common Crawl&lt;/a&gt; data&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dual Lab&lt;/strong&gt; announces the upcoming publication of a new analytical report on PDF Accessibility Trends from  the Common Crawl dataset.  Such deep analytical reports will be released quarterly and will provide data-driven insights into global PDF trends. The first report  analyzes &lt;strong&gt;15&lt;/strong&gt; million &lt;strong&gt;PDF documents from the CC-MAIN-2026-04 Common Crawl archive.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mild growth of Tagged PDFs share&lt;/strong&gt;&lt;br&gt;
As a preview we present a sample report showing the share of Tagged PDFs among all PDFs in the Common Crawl dataset, grouped by the document creation month.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Our analysis shows a mild increase in the proportion of tagged PDFs over the past three years. The share has been growing by approximately 1.5 percentage points per year, surpassing the significant milestone of 50% in mid-2025.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This means that today, more than half of newly created PDF documents appearing in the Common Crawl archives include structure tree with semantic information.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Tagged PDFs Matter&lt;/strong&gt;&lt;br&gt;
Tagged PDFs contain a structure tree that defines headings, paragraphs, tables, figures, and other semantic elements. This structure is essential for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The ability of Screen readers to understand the document&lt;/li&gt;
&lt;li&gt;Logical reading order&lt;/li&gt;
&lt;li&gt;Compliance with accessibility standards such as PDF/UA&lt;/li&gt;
&lt;li&gt;Alignment with WCAG requirements&lt;/li&gt;
&lt;li&gt;The growth in tagged documents indicates a positive global shift toward better structured and potentially more accessible PDF publishing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Trend in the Share of Tagged PDFs Among All PDFs&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnvb9nl4m7kf5ptrqzfwu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnvb9nl4m7kf5ptrqzfwu.png" alt=" " width="736" height="355"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dual Lab&lt;/strong&gt; analyzed &lt;strong&gt;15&lt;/strong&gt; millions of PDF documents from the Common Crawl dataset &lt;strong&gt;CC-MAIN-2026-04&lt;/strong&gt; to examine how the share of tagged PDFs has changed over time.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The results show a clear rising trend over the past three years. The proportion of tagged PDFs documents containing a structural tag tree has increased steadily by approximately 1.5 percentage points per year.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;A key milestone **was reached in **mid-2025 (July)&lt;/strong&gt;, when the share exceeded 50% for the first time. This indicates that more than half of newly created PDF documents indexed in Common Crawl now include structural tagging.&lt;/p&gt;

&lt;p&gt;The growth reflects broader adoption of structured document generation tools and increasing awareness of accessibility and machine-readability requirements. While the trend is positive, continued monitoring is essential to evaluate not only the presence of tags but also their structural quality.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reports by Dual Lab&lt;/strong&gt;&lt;br&gt;
Dual Lab aims to provide objective data that supports users, accessibility experts, and organizations working toward more inclusive digital content.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The first full report will be published soon.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Reports will be available: Dual Lab website, &lt;a href="https://pdf4wcag.com/" rel="noopener noreferrer"&gt;PDF4WCAG&lt;/a&gt; website (the PDF Accessibility validation tool developed by Dual Lab), &lt;a href="https://groups.google.com/g/duallab" rel="noopener noreferrer"&gt;Google Group Dual Lab Dual Lab Reports on PDF Accessibility Trends&lt;/a&gt;; our channels in &lt;a href="https://x.com/PDF4WCAG" rel="noopener noreferrer"&gt;X &lt;/a&gt;and &lt;a href="https://www.linkedin.com/company/3658503/admin/dashboard/" rel="noopener noreferrer"&gt;Linkedin&lt;/a&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  pdf #pdf4wcag #accessibility #duallab
&lt;/h1&gt;

</description>
      <category>a11y</category>
      <category>pdf</category>
      <category>ai</category>
      <category>programming</category>
    </item>
    <item>
      <title>OpenDataLoader PDF v2.0 by Hancom has claimed the #1 spot on GitHub's overall open-source trending chart within just one week of its release, earning the GitHub Trending bhttps://dev.to/katash/opendataloader-pdf-v20-hits-1-on-github-trending-globally--1ffa</title>
      <dc:creator>Julia</dc:creator>
      <pubDate>Tue, 24 Mar 2026 11:47:59 +0000</pubDate>
      <link>https://dev.to/katash/opendataloader-pdf-v20-by-hancom-has-claimed-the-1-spot-on-githubs-overall-open-source-trending-40eg</link>
      <guid>https://dev.to/katash/opendataloader-pdf-v20-by-hancom-has-claimed-the-1-spot-on-githubs-overall-open-source-trending-40eg</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/katash/opendataloader-pdf-v20-hits-1-on-github-trending-globally--1ffa" class="crayons-story__hidden-navigation-link"&gt;OpenDataLoader PDF v2.0 Hits #1 on GitHub Trending Globally !&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/katash" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3647888%2F1438cec9-6a18-460d-ae5f-d68ccd021403.jpg" alt="katash profile" class="crayons-avatar__image" width="384" height="526"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/katash" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Julia
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Julia
                
              
              &lt;div id="story-author-preview-content-3391237" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/katash" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3647888%2F1438cec9-6a18-460d-ae5f-d68ccd021403.jpg" class="crayons-avatar__image" alt="" width="384" height="526"&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Julia&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/katash/opendataloader-pdf-v20-hits-1-on-github-trending-globally--1ffa" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Mar 23&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/katash/opendataloader-pdf-v20-hits-1-on-github-trending-globally--1ffa" id="article-link-3391237"&gt;
          OpenDataLoader PDF v2.0 Hits #1 on GitHub Trending Globally !
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/webdev"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;webdev&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/programming"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;programming&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/python"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;python&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
            &lt;a href="https://dev.to/katash/opendataloader-pdf-v20-hits-1-on-github-trending-globally--1ffa#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              &lt;span class="hidden s:inline"&gt;Add Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            2 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
    </item>
    <item>
      <title>OpenDataLoader PDF v2.0 Hits #1 on GitHub Trending Globally !</title>
      <dc:creator>Julia</dc:creator>
      <pubDate>Mon, 23 Mar 2026 18:49:14 +0000</pubDate>
      <link>https://dev.to/katash/opendataloader-pdf-v20-hits-1-on-github-trending-globally--1ffa</link>
      <guid>https://dev.to/katash/opendataloader-pdf-v20-hits-1-on-github-trending-globally--1ffa</guid>
      <description>&lt;p&gt;&lt;a href="https://opendataloader.org/" rel="noopener noreferrer"&gt;OpenDataLoader PDF&lt;/a&gt; v2.0 Hits #1 on &lt;a href="https://github.com/opendataloader-project/opendataloader-pdf" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; &lt;strong&gt;Trending Globally — Just One Week After Launch!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkie24dees2odi7i1grx4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkie24dees2odi7i1grx4.png" alt=" " width="727" height="386"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenDataLoader PDF v2.0&lt;/strong&gt;  has claimed the #1 spot on GitHub’s overall open-source trending chart within just one week of its release, earning the GitHub Trending badge.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpfex63nqcivdowgk9w2m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpfex63nqcivdowgk9w2m.png" alt=" " width="800" height="577"&gt;&lt;/a&gt;&lt;br&gt;
GitHub Trending is a real-time index tracking the open-source projects attracting the most attention from developers worldwide. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Reaching #1 is widely recognized as a clear signal of interest and trust from the global developer community.&lt;br&gt;
&lt;em&gt;On March 21 alone, OpenDataLoader PDF v2.0 gained over 1,800 new GitHub stars, surpassing 7,000 total stars and 500 forks.&lt;/em&gt; This growth trajectory outpaces typical open-source projects and places it on par with the world's top-tier repositories. As developer-driven metrics, these numbers serve as a strong benchmark for the technology's visibility, usefulness, and credibility.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;OpenDataLoader PDF is a technology that breaks down complex PDF documents into text, tables, images, and other elements, converting them into formats that AI can process directly. While PDF is the most widely used document format for AI training, its complex internal structure has long made data extraction a significant bottleneck in AI development. Hancom signed an MOU with &lt;a href="https://duallab.com/" rel="noopener noreferrer"&gt;Dual Lab&lt;/a&gt;, a global PDF technology specialist, in July 2025 and began co-development shortly thereafter. An initial version was released in September of the same year, followed by the launch of v2.0 on March 12.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;v2.0 features&lt;/strong&gt; a hybrid engine that combines AI-based and direct extraction methods, operating entirely in a local environment without transmitting data to external servers.&lt;br&gt;
&lt;strong&gt;The release includes four built-in AI add-ons OCR, Table Extraction, Formula Extraction, and Chart Analysis and ensures technical compatibility with other open-source AI models such as Docling.&lt;/strong&gt; In Hancom's own benchmark tests, v2.0 achieved the highest accuracy among comparable open-source solutions across all evaluated categories, including reading order, table extraction, and heading detection. The test data and reproducible code have been made publicly available in the official &lt;a href="https://github.com/opendataloader-project/opendataloader-pdf" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; repository to ensure full transparency.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Last year, &lt;strong&gt;OpenDataLoader PDF&lt;/strong&gt; was officially registered as a component of LangChain, the global AI development framework. In 2026, the team plans to expand integrations with major AI frameworks including Langflow , LlamaIndex, and Gemini CLI, while also preparing MCP (Model Context Protocol) support for AI agents. Additionally, v2.0 adopts the Apache 2.0 license one of the most permissive open-source licenses for commercial use significantly lowering the barrier to adoption for enterprises and developers alike.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kim Yeon-su, CEO of Hancom,&lt;/strong&gt; stated, &lt;em&gt;"Through the transition to the Apache 2.0 license, we will continue to evolve OpenDataLoader PDF into an open PDF data platform that companies and developers around the world can freely utilize and build upon."&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>ai</category>
      <category>programming</category>
      <category>python</category>
    </item>
    <item>
      <title>#1 Repository of the Day https://github.com/opendataloader-project/opendataloader-pdf?tab=readme-ov-file#readme OpenDataLoader PDF parser for AI data extraction</title>
      <dc:creator>Julia</dc:creator>
      <pubDate>Thu, 19 Mar 2026 06:21:54 +0000</pubDate>
      <link>https://dev.to/katash/1-repository-of-the-day-118m</link>
      <guid>https://dev.to/katash/1-repository-of-the-day-118m</guid>
      <description>&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
        &lt;div class="c-embed__cover"&gt;
          &lt;a href="https://github.com/opendataloader-project/opendataloader-pdf?tab=readme-ov-file#readme" class="c-link align-middle" rel="noopener noreferrer"&gt;
            &lt;img alt="" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fopengraph.githubassets.com%2Fdeda5de829a4a1b0c1aa393b41140357c49e95200ebe4d754471431dcf161e6c%2Fopendataloader-project%2Fopendataloader-pdf" height="600" class="m-0" width="1200"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="c-embed__body"&gt;
        &lt;h2 class="fs-xl lh-tight"&gt;
          &lt;a href="https://github.com/opendataloader-project/opendataloader-pdf?tab=readme-ov-file#readme" rel="noopener noreferrer" class="c-link"&gt;
            GitHub - opendataloader-project/opendataloader-pdf: PDF Parser for AI-ready data. Automate PDF accessibility. Open-source. · GitHub
          &lt;/a&gt;
        &lt;/h2&gt;
          &lt;p class="truncate-at-3"&gt;
            PDF Parser for AI-ready data. Automate PDF accessibility. Open-source. - opendataloader-project/opendataloader-pdf
          &lt;/p&gt;
        &lt;div class="color-secondary fs-s flex items-center"&gt;
            &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.githubassets.com%2Ffavicons%2Ffavicon.svg" width="32" height="32"&gt;
          github.com
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


</description>
    </item>
    <item>
      <title>#1 Repository of the Day https://github.com/opendataloader-project/opendataloader-pdf?tab=readme-ov-file#readme OpenDataLoader PDF parser for AI data extraction — Extract Markdown, JSON (with bounding boxes), and HTML from any PDF. #1 in benchmarks (0.90</title>
      <dc:creator>Julia</dc:creator>
      <pubDate>Thu, 19 Mar 2026 06:21:29 +0000</pubDate>
      <link>https://dev.to/katash/1-repository-of-the-day-31ek</link>
      <guid>https://dev.to/katash/1-repository-of-the-day-31ek</guid>
      <description>&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
        &lt;div class="c-embed__cover"&gt;
          &lt;a href="https://github.com/opendataloader-project/opendataloader-pdf?tab=readme-ov-file#readme" class="c-link align-middle" rel="noopener noreferrer"&gt;
            &lt;img alt="" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fopengraph.githubassets.com%2Fdeda5de829a4a1b0c1aa393b41140357c49e95200ebe4d754471431dcf161e6c%2Fopendataloader-project%2Fopendataloader-pdf" height="600" class="m-0" width="1200"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="c-embed__body"&gt;
        &lt;h2 class="fs-xl lh-tight"&gt;
          &lt;a href="https://github.com/opendataloader-project/opendataloader-pdf?tab=readme-ov-file#readme" rel="noopener noreferrer" class="c-link"&gt;
            GitHub - opendataloader-project/opendataloader-pdf: PDF Parser for AI-ready data. Automate PDF accessibility. Open-source. · GitHub
          &lt;/a&gt;
        &lt;/h2&gt;
          &lt;p class="truncate-at-3"&gt;
            PDF Parser for AI-ready data. Automate PDF accessibility. Open-source. - opendataloader-project/opendataloader-pdf
          &lt;/p&gt;
        &lt;div class="color-secondary fs-s flex items-center"&gt;
            &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.githubassets.com%2Ffavicons%2Ffavicon.svg" width="32" height="32"&gt;
          github.com
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


</description>
    </item>
    <item>
      <title>#1 Repository of the Day</title>
      <dc:creator>Julia</dc:creator>
      <pubDate>Thu, 19 Mar 2026 06:20:31 +0000</pubDate>
      <link>https://dev.to/katash/1-repository-of-the-day-47jk</link>
      <guid>https://dev.to/katash/1-repository-of-the-day-47jk</guid>
      <description>&lt;h1&gt;
  
  
  1 Repository of the Day OpenDataLoader
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi29g8b7wdkakljjtss09.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi29g8b7wdkakljjtss09.png" alt=" " width="800" height="577"&gt;&lt;/a&gt;&lt;br&gt;
🔍 &lt;a href="https://github.com/opendataloader-project/opendataloader-pdf?tab=readme-ov-file#readme" rel="noopener noreferrer"&gt;PDF parser for AI data extraction&lt;/a&gt; — Extract Markdown, JSON (with bounding boxes), and HTML from any PDF. #1 in benchmarks (0.90 overall). Deterministic local mode + AI hybrid mode for complex pages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How accurate is it?&lt;/strong&gt; — #1 in benchmarks: 0.90 overall, 0.93 table accuracy across 200 real-world PDFs including multi-column and scientific papers. Deterministic local mode + AI hybrid mode for complex pages (benchmarks)&lt;br&gt;
&lt;strong&gt;Scanned PDFs and OCR?&lt;/strong&gt; — Yes. Built-in OCR (80+ languages) in hybrid mode. Works with poor-quality scans at 300 DPI+ (hybrid mode)&lt;br&gt;
Tables, formulas, images, charts? — Yes. Complex/borderless tables, LaTeX formulas, and AI-generated picture/chart descriptions all via hybrid mode (hybrid mode)&lt;br&gt;
&lt;strong&gt;How do I use this for RAG?&lt;/strong&gt; — pip install opendataloader-pdf, convert in 3 lines. Outputs structured Markdown for chunking, JSON with bounding boxes for source citations, and HTML. LangChain integration available. Python, Node.js, Java SDKs (quick start | LangChain)&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>webdev</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Hancom Unveils 'OpenDataLoader PDF v2.0' "Ranked No. 1 in Open Source PDF Data Extraction Benchmarks"</title>
      <dc:creator>Julia</dc:creator>
      <pubDate>Fri, 13 Mar 2026 06:39:11 +0000</pubDate>
      <link>https://dev.to/katash/hancom-unveils-opendataloader-pdf-v20-ranked-no-1-in-open-source-pdf-data-extraction-3eak</link>
      <guid>https://dev.to/katash/hancom-unveils-opendataloader-pdf-v20-ranked-no-1-in-open-source-pdf-data-extraction-3eak</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhf6z8zr924f4kblv4dlt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhf6z8zr924f4kblv4dlt.png" alt=" " width="800" height="670"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hancom unveiled 'OpenDataLoader PDF v2.0,&lt;/strong&gt;' which has achieved the No. 1 benchmark performance in the open-source PDF data extraction category. The benchmark test data and detailed reproducible code are available in the official &lt;a href="https://github.com/opendataloader-project/opendataloader-pdf" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; repository, underscoring Hancom's commitment to transparency — a foundational principle of open source.&lt;/p&gt;

&lt;p&gt;This version features a hybrid engine that combines AI-based and direct extraction methods. As a result, businesses and developers can leverage high-performance PDF data extraction capabilities free of charge in a fully airgapped local environment, eliminating any risk of data leakage to external servers.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;In addition, this version comes equipped with four free AI add-ons designed to extract complex elements within documents. 'Optical Character Recognition (OCR)' enhances text recognition accuracy for image-based PDFs and scanned documents. 'Table Extraction' leverages an ultra-lightweight AI model to precisely analyze complex table structures, including merged cells. 'Formula Extraction' recognizes complex mathematical and scientific equations from academic papers in a local environment. 'Chart Analysis' interprets the contextual meaning of charts and delivers the insights as human-readable narrative.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;These four add-ons have been implemented to ensure compatibility with third-party open-source AI models such as Docling. Although there is no official partnership or sponsorship with any specific entity, objective technical compatibility has been secured so that users can seamlessly integrate the addons within their existing technology environments. The flexible add-on architecture also allows for the integration of additional AI models in the future.&lt;/p&gt;

&lt;p&gt;With this release, the open-source license **has been updated from MPL 2.0 (Mozilla Public License 2.0) to Apache 2.0 (Apache License 2.0). **By adopting one of the most permissive commercial licenses available, Hancom has significantly reduced friction for external developers and global enterprises looking to build on the platform. This is expected to foster the growth of a diverse business model ecosystem including WebApps and SaaS solutions — built on OpenDataLoader PDF.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.linkedin.com/company/%ED%95%9C%EA%B8%80%EA%B3%BC%EC%BB%B4%ED%93%A8%ED%84%B0/posts/?feedView=all" rel="noopener noreferrer"&gt;Hancom&lt;/a&gt; is also pursuing ecosystem expansion in line with the era of autonomous AI agents. Having completed LangChain integration in 2025, the company will broaden its connectivity with a range of AI frameworks in 2026, including Langflow, LlamaIndex, and Gemini-cli. Hancom is also preparing MCP (Model Context Protocol) functionality to support AI agents.&lt;/p&gt;

&lt;p&gt;**In the second half of 2026, Hancom plans to release a commercial AI add-on that consolidates its proprietary document AI technologies. **Furthermore, the company aims to be the first open-source solution to automatically generate accessibility tags through AI-driven document structure analysis. With the enforcement of the European Accessibility Act (EAA) and the strengthening of Korea's Act on the Prohibition of Discrimination Against Persons with Disabilities, companies worldwide are facing growing pressure to ensure compliance with digital document accessibility standards. Against this backdrop, Hancom plans to expand into a PDF AI accessibility solution that meets the global accessibility standard (PDF/UA) and establish a new open-source-based business model.&lt;/p&gt;

&lt;p&gt;Jeong Ji-hwan, Chief Technology Officer (CTO) of Hancom, stated, "OpenDataLoader PDF v2.0 has evolved into an open PDF data platform that anyone can freely utilize and build upon, made possible through the AI hybrid engine and the transition to the Apache 2.0 license." He added, "Going forward, through our commercial AI add-ons and accessibility solutions, we will not only enable PDF documents worldwide to be leveraged by AI, but also shape a global ecosystem where documents are truly accessible to all."&lt;/p&gt;

&lt;h1&gt;
  
  
  opendataloader #pdf #AI
&lt;/h1&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>webdev</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
