DEV Community

Ryan Thomas
Ryan Thomas

Posted on • Updated on

Word2Text cli on openbsd and linux

I have regularly had the need to search/grep MS Word files for various things, normally ended up using a script running on Windows or more recently a docker container containing bunch of openoffice and a bunch of dependencies. I am a big fan of the Go/Golang programming language as it can generate a standalone executable and it has made it possible for geeks such as myself to create a minimal docker images with next to no dependencies. Therefore I came across the UniDoc UniOffice and UniPDF projects that can create Word and PDF documents in pure Go which means it is possible to create standalone binaries. I could very quickly create a few line CLI application that can convert the MS Word document into text and then I can use the standard unix grep and other utilities for further text processing. The tool is accessible at https://github.com/rthomascloud/word2text. I have also put it into a minimal Docker container based on Alpine which I use also.

In the coming weeks I will be creating more similar tools to convert PDF to Word, Powerpoint to Text and Excel to text to replace the hacked up openoffice stuff I have put together, stay tuned :).

Top comments (0)