DEV Community

Cover image for DocWire SDK in 2025 – Architecture, AI Pipelines, and Document Processing in Modern C++
Krzysztof Nowicki
Krzysztof Nowicki

Posted on

DocWire SDK in 2025 – Architecture, AI Pipelines, and Document Processing in Modern C++

In 2025, most of the work on DocWire SDK focused on architecture, correctness, and long-term maintainability, rather than surface-level features.

To document this work, we published a technical recap video summarizing the most important engineering changes introduced during the year.

DocWire SDK – 2025 Technical Summary

What changed in 2025

The video covers, among others:

  • migration from a std::variant-based data model to a polymorphic, message-driven pipeline architecture
  • exposing DocWire pipelines as HTTP/HTTPS microservices
  • integration of local, offline AI embeddings (multilingual E5 models)
  • expanded OpenAI support (GPT-4o, GPT-5, embeddings, transcription, TTS)
  • replacement of PoDoFo with Google PDFium
  • high-precision OCR and PDF positional metadata
  • image-aware PDF processing and OCR
  • modern HTML parsing and robust charset conversion
  • zero-cost logging, structured error diagnostics, and CI/CD modernization

The intent of this video is purely technical: architecture, APIs, performance, and engineering decisions.

Project links

If you're working on document processing, backend systems, or AI-assisted pipelines in modern C++, feedback and discussion are welcome.

Top comments (0)