Hey everyone!
It’s been a while since our last post — and that’s because we’ve been deep in the engine room, rewriting, refactoring, and rebuilding. And now, we’re thrilled to share the results:
DocWire SDK 2025.05.22 is now on GitHub
https://github.com/docwire/docwire/releases/tag/2025.05.22
This is not just a regular update. It’s one of the biggest internal shifts DocWire has seen to date — and it lays the foundation for where we’re headed next.
What’s New in 2025.05.22
PDF Image Extraction + OCR
You can now extract images directly from PDF files and process them through downstream chain elements, including content type detection and OCR for embedded text. It’s a powerful step toward full-stack document intelligence.
HTML & Plain Text Writers Improved
- Image tags are now supported.
- Data URLs and OCR-derived text are embedded where applicable.
Smarter, Stronger Testing
- New automatic tests for image extraction & OCR.
- Windows-specific test discovery fixes (no more silent ctest misses).
-
Improved CI structure: better reporting, tighter test execution.
Internal Overhaul - Chain & Parser Refactors
A major internal modernization took place under the hood:
- Core Chain Elements were refactored for clarity and flexibility. Processing stages can now clearly continue, skip, or stop — making the flow easier to follow and debug.
-
Parsers were unified under a cleaner structure by directly implementing
ChainElement
and improvingdata_source
checks. - Cleaner Tag Emission Logic : chain elements can now reprocess data more easily, enabling more advanced flows in the future.
-
HTML components were reorganized for clarity and modularity.
Fixes
Thread-safe parser MIME initialization
Test discovery working on Windows with a custom
main()
Clean linking of
docwire_html
andmailio
(v0.25.1 via vcpkg)-
PSTParser threading fix
Developer Summary (TL;DR)
From PDFs, images now take flight,
Through refined chains, data flows bright.
With steadier tests and safer threads,
DocWire advances, new paths it treads.
This release brings visual data, smoother architecture, and tighter CI discipline — all while staying true to our mission: giving C++ developers modern, robust tools for complex data extraction and AI-ready pipelines.
Try It Out
Install the latest release and check out the new features:
https://github.com/docwire/docwire/releases/tag/2025.05.22
Use case demos, test coverage, and upgrade guidance are included.
!!! We Want to Hear From You !!!
- Found it useful? Drop a ⭐ on GitHub.
- See a bug? Open an issue.
-
Have a wild idea? Let’s talk.
Stay tuned — more frequent updates coming your way.
Happy hacking,
The DocWire Team
Top comments (0)