DEV Community

yaugika-amit
yaugika-amit

Posted on

Best way to convert Word docx to ASCII doc by Pandoc w/o loosing styles

Hello Folks,

I have a huge set of content in docx format that i need to move to ASCII docs / adoc format. I have followed the instruction at page https://docs.asciidoctor.org/asciidoctor/latest/migrate/ms-word/ to convert few docs.
Many custom defined styles in the original word docx template ( such as code block, fonts, styles, ) end up breaking (loosing / misaligned / messed up) and moreover searching such style elements and content and comparing both adoc and docx file side by side for all converted elements is too much painful. (Imagine 300 -400 pages docs of manuals! :( ) . I did try to modify the custom style to match ascii docs format - still not much improvement.

What are the various customization I could use with Pandoc in such conversion ( docx to Ascii ) to minimize manual work for fixing styles? ( in particular code blocks, inline code blocks, etc. .)

Is there a way to use a custom variable or style ( highlighting may be? ) that could be tagged to different original word styles so it can be visible (marked distinctly from other styles and elements after conversion so it can be searched and manually fixed) ?

I dig up the post at https://learnbyexample.github.io/customizing-pandoc/
to find some useful pointers. Planning to test these and see how adoc converted file shows up. Does converting markdown to pdf is similar to converting word docx to Ascii format - does all the customization may hold true in such cases? Any particular command or syntax I could use or look out for?

I would be really thankful if the community could provide useful pointers and thoughts.

PS: Pardon my lack of knowledge and feel free to point out any particular details you may need - I am noob to tech world or programming.

Appreciate the help!
Thanks

Heroku

Simplify your DevOps and maximize your time.

Since 2007, Heroku has been the go-to platform for developers as it monitors uptime, performance, and infrastructure concerns, allowing you to focus on writing code.

Learn More

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Engage with a sea of insights in this enlightening article, highly esteemed within the encouraging DEV Community. Programmers of every skill level are invited to participate and enrich our shared knowledge.

A simple "thank you" can uplift someone's spirits. Express your appreciation in the comments section!

On DEV, sharing knowledge smooths our journey and strengthens our community bonds. Found this useful? A brief thank you to the author can mean a lot.

Okay