Modern C++ isn’t dying. It’s eating file formats for breakfast.
Here’s what MS Word document parsing looks like today :
std::filesystem::path("data_processing_definition.doc")
| content_type::detector{}
| office_formats_parser{}
| PlainTextExporter()
| out_stream;
ensure(out_stream.str()) ==
"Data processing refers to the activities performed on raw data...";
No COM, no Windows-only hacks, no XML archaeology - just a clean, composable pipeline in modern C++.
So now I’m genuinely curious: If this is what parsing looks like in modern C++, what does it look like in your favorite language?
Drop your snippet below.
Top comments (2)
Love how clean this pipeline looks—feels almost “functional C++.” The fact you can avoid COM and XML spelunking and still get a neat stream-based API is impressive. Curious what libraries/tooling you're using under the hood for
content_type::detectorandoffice_formats_parser.Hey Harris. Detection is implemented using LibMagic with accessories, and parsing is performed using many different libraries depending on the format. It's great that you can, for example, use good C libraries, share them on the backend, and have a nice API. But if you want a ready-made solution, you have DocWire. I wanted to see what "modern" means to users, and whether it's even relevant for production use? We're looking for inspiration for our SDK roadmap, so any feedback is valuable to us.