DEV Community

Cover image for Batch-converting documents to markdown with Microsoft's markitdown
Schiff Heimlich
Schiff Heimlich

Posted on

Batch-converting documents to markdown with Microsoft's markitdown

Here's a quick tool that landed in my queue recently: microsoft/markitdown

It's a Python CLI that converts PDFs, Word docs, PowerPoint, and Excel files to Markdown. Not groundbreaking, but if you've ever had to process a folder of legacy documentation for a static site, you know the value of not doing it manually.

Two things I found useful:

Batch conversion with piping

markitdown --input document.docx --output converted/
Enter fullscreen mode Exit fullscreen mode

You can point it at a directory and it processes everything in one shot. Combine with standard Unix tools:

find ./legacy-docs -name '*.docx' | xargs -I{} sh -c 'markitdown --input {} --output ./md/'
Enter fullscreen mode Exit fullscreen mode

stdout output for scripting

markitdown document.pdf
Enter fullscreen mode Exit fullscreen mode

Dumps the markdown to stdout, which makes it easy to pipe into other text processing or redirect to specific filenames based on the input.

It's on PyPI (pip install markitdown), so it'll drop into a CI pipeline without much friction. If you've got a documentation migration on your plate and you're tired of manual conversions, it's worth a look.

https://github.com/microsoft/markitdown

Top comments (0)