DEV Community

Krzysztof Nowicki
Krzysztof Nowicki

Posted on

Modern C++ vs. The World: How Would Your Language Parse a Word Document?

Modern C++ isn’t dying. It’s eating file formats for breakfast.
Here’s what MS Word document parsing looks like today :

std::filesystem::path("data_processing_definition.doc")
    | content_type::detector{}
    | office_formats_parser{}
    | PlainTextExporter()
    | out_stream;

ensure(out_stream.str()) ==
    "Data processing refers to the activities performed on raw data...";
Enter fullscreen mode Exit fullscreen mode

No COM, no Windows-only hacks, no XML archaeology - just a clean, composable pipeline in modern C++.
So now I’m genuinely curious: If this is what parsing looks like in modern C++, what does it look like in your favorite language?
Drop your snippet below.

Top comments (2)

Collapse
 
hfrench profile image
Harris French

Love how clean this pipeline looks—feels almost “functional C++.” The fact you can avoid COM and XML spelunking and still get a neat stream-based API is impressive. Curious what libraries/tooling you're using under the hood for content_type::detector and office_formats_parser.

Collapse
 
novitzmann profile image
Krzysztof Nowicki

Hey Harris. Detection is implemented using LibMagic with accessories, and parsing is performed using many different libraries depending on the format. It's great that you can, for example, use good C libraries, share them on the backend, and have a nice API. But if you want a ready-made solution, you have DocWire. I wanted to see what "modern" means to users, and whether it's even relevant for production use? We're looking for inspiration for our SDK roadmap, so any feedback is valuable to us.