In 2025, most of the work on DocWire SDK focused on architecture, correctness, and long-term maintainability, rather than surface-level features.
To document this work, we published a technical recap video summarizing the most important engineering changes introduced during the year.
DocWire SDK – 2025 Technical Summary
What changed in 2025
The video covers, among others:
- migration from a
std::variant-based data model to a polymorphic, message-driven pipeline architecture - exposing DocWire pipelines as HTTP/HTTPS microservices
- integration of local, offline AI embeddings (multilingual E5 models)
- expanded OpenAI support (GPT-4o, GPT-5, embeddings, transcription, TTS)
- replacement of PoDoFo with Google PDFium
- high-precision OCR and PDF positional metadata
- image-aware PDF processing and OCR
- modern HTML parsing and robust charset conversion
- zero-cost logging, structured error diagnostics, and CI/CD modernization
The intent of this video is purely technical: architecture, APIs, performance, and engineering decisions.
Project links
- GitHub: https://github.com/docwire/docwire
- Website: https://www.docwire.io
If you're working on document processing, backend systems, or AI-assisted pipelines in modern C++, feedback and discussion are welcome.
Top comments (0)