DEV Community

Cover image for Why Word to LaTeX Conversion Breaks (And What to Do About It)
Saurabh Shah
Saurabh Shah

Posted on

Why Word to LaTeX Conversion Breaks (And What to Do About It)

A technical look at why automated tools fail on academic papers — and what actually works.

If you've ever tried to convert a Word document to LaTeX using Pandoc or an online converter, you already know how this ends. The text comes through fine. Everything else doesn't.

We've handled hundreds of Word to LaTeX conversion projects at The LaTeX Lab — IEEE papers, Springer submissions, Elsevier articles, conference proceedings. The same failure modes show up every time. Here's what's actually going wrong under the hood.

Why Pandoc Fails on Academic Papers

Pandoc is genuinely impressive for what it does. But the way it handles equations reveals a core limitation: it converts Word's OOXML math format to LaTeX by pattern-matching, not by understanding mathematical structure.

The result is that simple inline equations often convert correctly. Anything structurally complex — nested fractions, matrix environments, multi-line align blocks, custom operators — gets either mangled or exported as an image fallback. You end up with this:

% What Pandoc gives you for a matrix equation:
\includegraphics{eq_img_01.png}

% What it should be:
\begin{equation}
\mathbf{A} = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix}
\end{equation}
Enter fullscreen mode Exit fullscreen mode

Image fallbacks technically compile, but they fail journal submission checks — most publishers require equations in proper math mode, not embedded images.

The Table Problem Is Worse

Word tables store alignment, merging, and cell padding in a proprietary format that has no clean LaTeX equivalent. Pandoc's output for a moderately complex table typically looks like this:

% Pandoc table output — broken alignment:
\begin{longtable}[]{@{}lll@{}}
Name & & Value \\ % ← empty cell where merged cell was
\end{longtable}
Enter fullscreen mode Exit fullscreen mode

A properly formatted academic table should use booktabs with \toprule, \midrule, and \bottomrule. Pandoc doesn't output booktabs by default, and even with the --no-highlight flag it doesn't reconstruct merged cells correctly.

Bibliography: The Silent Failure

This is the one that costs the most time. Pandoc exports citations as plain text — it doesn't produce a .bib file. If your Word document uses Zotero or Mendeley for citations, those come through as formatted strings, not structured BibTeX entries.

Which means you're left manually recreating every entry:

% What you need:
@article{Smith2023,
author = {Smith, John and Doe, Jane},
title = {A Study of X},
journal = {Nature},
year = {2023},
volume = {12},
pages = {45--67},
doi = {10.1038/s41586-023-00001-x}
}

% What Pandoc gives you in the .tex file:
Smith, J., & Doe, J. (2023). A Study of X. \textit{Nature}, 12, 45–67.
Enter fullscreen mode Exit fullscreen mode

For a paper with 30–50 references, rebuilding the bibliography from scratch is a 2–3 hour job on its own.

What a Clean Word to LaTeX Conversion Actually Looks Like

For a properly converted academic paper, you need:

  • Every equation typeset in LaTeX math mode — $...$ for inline, \begin{equation} or align for display
  • Tables rebuilt in booktabs format with correct column alignment
  • Bibliography as a complete .bib file with proper entry types (@article, @inproceedings, @book) and all required fields
  • Journal template applied and verified — IEEEtran, acmart, elsarticle, llncs, etc.
  • The whole thing compiled and tested before you touch it

Automated tools handle the first bullet partially on a good day. They don't touch the rest.

The Actual Time Cost of DIY Conversion

Here's what a realistic Word to LaTeX conversion looks like if you do it yourself on a 15-page IEEE paper with 25 equations, 4 tables, and 30 citations:

  • Run Pandoc, fix encoding issues - 30 min
  • Re-typeset broken equations by hand - 3–4 hrs
  • Rebuild tables in booktabs - 1–2 hrs
  • Rebuild bibliography as BibTeX - 2–3 hrs
  • Apply IEEEtran template, fix conflicts - 1–2 hrs
  • Debug compilation errors - 1–2 hrs
  • Total - 8–14 hrs

That's not an edge case — that's a typical paper. For researchers on submission deadlines, it's a significant hidden cost.

Further Reading

The LaTeX Lab offers professional Word to LaTeX conversion for academic papers — equations in proper math mode, tables rebuilt, bibliography in clean BibTeX, journal template applied and tested in Overleaf. Standard delivery in 72 hours. Get a quote here.

Top comments (0)