DEV Community

Furkan Kalkan
Furkan Kalkan

Posted on

Quick Hack: Converting MathML to LaTeX

Recently, I need to convert some MathML codes in article metadata from SCOAP3 to LaTex format. Most of institutional repositories escapes XML entities, so MathML doesn't render correctly. I tried the Wiris' API but it's very slow and give errors in most of long formulas.
Finally, I found Yaroshevich's XSL Schema that works without problem.

Example Python code:

import lxml.etree as ET

def to_latex(text):

    """ Remove TeX codes in text"""
    text = re.sub(r"(\$\$.*?\$\$)", " ", text) 

    """ Find MathML codes and replace it with its LaTeX representations."""
    mml_codes = re.findall(r"(<math.*?<\/math>)", text)
    for mml_code in mml_codes:
        mml_ns = mml_code.replace('<math>', '<math xmlns="">') #Required.
        mml_dom = ET.fromstring(mml_ns)
        xslt = ET.parse("mmltex/mmltex.xsl")
        transform = ET.XSLT(xslt)
        mmldom = transform(mml_dom)
        latex_code = str(mml_dom)
        text = text.replace(mml_code, latex_code)
    return text
Enter fullscreen mode Exit fullscreen mode

Top comments (2)

gateragael profile image
Gael Ruta Gatera • Edited

Hello Furkan, Ignore my previous comment as the code works fine besides a small mistake. Right where you are transforming ----> mmldom = transform(mml_dom) you passed in the same variable "mmldom" into the convert to string function ----> latex_code = str(mml_dom)

gateragael profile image
Gael Ruta Gatera
  • mmldom = transform(mml_dom)
  • latex_code = str(mmldom)