DEV Community

Cover image for DocWire SDK 2025.05.22 Released – A Big Leap for PDF, OCR, and Core Architecture
Krzysztof Nowicki
Krzysztof Nowicki

Posted on

DocWire SDK 2025.05.22 Released – A Big Leap for PDF, OCR, and Core Architecture

Hey everyone!

It’s been a while since our last post — and that’s because we’ve been deep in the engine room, rewriting, refactoring, and rebuilding. And now, we’re thrilled to share the results:

 DocWire SDK 2025.05.22 is now on GitHub
Enter fullscreen mode Exit fullscreen mode

https://github.com/docwire/docwire/releases/tag/2025.05.22

This is not just a regular update. It’s one of the biggest internal shifts DocWire has seen to date — and it lays the foundation for where we’re headed next.

               What’s New in 2025.05.22
Enter fullscreen mode Exit fullscreen mode

PDF Image Extraction + OCR

You can now extract images directly from PDF files and process them through downstream chain elements, including content type detection and OCR for embedded text. It’s a powerful step toward full-stack document intelligence.

HTML & Plain Text Writers Improved

  • Image tags are now supported.
  • Data URLs and OCR-derived text are embedded where applicable.

Smarter, Stronger Testing

  • New automatic tests for image extraction & OCR.
  • Windows-specific test discovery fixes (no more silent ctest misses).
  • Improved CI structure: better reporting, tighter test execution.

        Internal Overhaul - Chain & Parser Refactors
    

A major internal modernization took place under the hood:

  • Core Chain Elements were refactored for clarity and flexibility. Processing stages can now clearly continue, skip, or stop — making the flow easier to follow and debug.
  • Parsers were unified under a cleaner structure by directly implementing ChainElement and improving data_source checks.
  • Cleaner Tag Emission Logic : chain elements can now reprocess data more easily, enabling more advanced flows in the future.
  • HTML components were reorganized for clarity and modularity.

                           Fixes
    
  • Thread-safe parser MIME initialization

  • Test discovery working on Windows with a custom main()

  • Clean linking of docwire_html and mailio (v0.25.1 via vcpkg)

  • PSTParser threading fix

                        Developer Summary (TL;DR)
    
  • From PDFs, images now take flight,

  • Through refined chains, data flows bright.

  • With steadier tests and safer threads,

  • DocWire advances, new paths it treads.

This release brings visual data, smoother architecture, and tighter CI discipline — all while staying true to our mission: giving C++ developers modern, robust tools for complex data extraction and AI-ready pipelines.

                    Try It Out
Enter fullscreen mode Exit fullscreen mode

Install the latest release and check out the new features:

https://github.com/docwire/docwire/releases/tag/2025.05.22

Use case demos, test coverage, and upgrade guidance are included.

                !!! We Want to Hear From You !!!
Enter fullscreen mode Exit fullscreen mode
  • Found it useful? Drop a ⭐ on GitHub.
  • See a bug? Open an issue.
  • Have a wild idea? Let’s talk.

     Stay tuned — more frequent updates coming your way.
    

Happy hacking,
The DocWire Team

Top comments (0)