DEV Community

Pacharapol Withayasakpunt
Pacharapol Withayasakpunt

Posted on • Originally published at polv.cc

Markdown to PDF: missing pieces from various approaches, and beyond HTML

Let me say this first, the best way to create PDF from markdown is via web technology (Chrome / Puppeteer), because it is the closest to WSYIWYG (What You See Is What You Get), but it is not perfect.

It currently misses at least one PDF specific features (and possibly more) -- Table of Contents / Bookmarks.

feature request: add option to generate TOC for pdf output #1778

Since now headers and footers with page numbers work, I now desperately miss an option to generate a Table of Contents (TOC) out of the h1- h7 headers when generating a pdf file (i.e like wkhtmltopdf is doing this). The TOC should be at the start of the pdf and it should not only be clickable (jump to the page) but also generate the outline pdf element, so that the TOC is displayed in the contents view in any viewer. Although this may sound complicated if this functionality is implemented at the right place it is not that complicated (take a look on how wkhtmltopdf is implementing this).

Before posting here I tried a couple of workarounds to achieve this. Some dead ends:

  • CSS3 target_counter() -> proposed a long time ago and only some specialised tools do it, to be honest I've given up to think that it will be implemented in chrome some day (reference issue is now: https://bugs.chromium.org/p/chromium/issues/detail?id=368053 )
  • Find a tool or tools to extract the table of contents from the generated pdf and generate a preface.pdf with the TOC witch to merge in-front of the original pdf ... With https://github.com/qpdf/qpdf I was able to generate a readable and searchable "pdf text-file" so that theoretically it was possible to find the header in the text file and via reverse search and the added comments find out on wich page it is etc. etc...

Any chance to get this soon? Thanks Ognian

And one of the best tools to create PDF is Visual Studio Code, if you know how to use Markdown Preview Enhanced properly. (I've just noticed that I can use this in Atom as well.)

The trick is, when previewing Markdown, right click on the Preview space to see

  • Open in browser, to tweak using Inspect Element
  • Chrome (Puppeteer) >> PDF, for shortcut to export to PDF. (You will also need Puppeteer)

MPE contextmenu

You can use custom CSS's.

Indeed, some CSS's are specific to printing, and you can customize that for Markdown Preview Enhanced (MPE).

I current recommend this LESS.

html, body {
  box-sizing: border-box;
  height: 100%;
  width: 100%;
}

.markdown-preview {
  box-sizing: border-box;
  position: relative;

  @media print, screen {
    section {
      display: flex;
      flex-direction: column;

      &[vertical-center] {
        min-height: 100%;
        justify-content: center;
      }

      &[horizontal-center] {
        align-items: center;
        text-align: center;
      }
    }

    section + *, h1 {
      page-break-before: always;
    }

    h1, h2, h3, h4, h5, h6 {
      page-break-after: avoid;
    }

    article {
      page-break-inside: avoid;
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Also, you can import your own LESS.

Importing other file types is also possible.

Markdown inside HTML, in order to use CSS

This is possible natively with Markdown-it, by leaving at least two new lines after the opening div.

<div class="center">

## Hello World

</div>
Enter fullscreen mode Exit fullscreen mode

Customizing Puppeteer, with YAML frontmatter

https://shd101wyy.github.io/markdown-preview-enhanced/#/puppeteer?id=configure-puppeteer

So, I made it like this.

---
id: print
class: 'title'
puppeteer:
  margin:
    top:    2cm
    bottom: 2cm
    left:   2cm
    right:  2cm
---

@import "/_styles/print.less"
Enter fullscreen mode Exit fullscreen mode

Going beyond HTML

Actually, I have already figured ways to go beyond HTML, including

  • Extending Markdown with template engines. EJS has nice syntax-highlighting inside Markdown in VSCode
  • Non-Markdown/HTML - LaTeX or ConTeXt - via pandoc, or natively
  • PDF manipulation libraries. Some of my recommendations are
    • pdfbox - Java; can also run without JDK via java -jar pdfbox-app-2.y.z.jar
    • pdf-lib - Node.js
    • PyPDF2 - Python (there seems to be up to PyPDF4, now)

Top comments (2)

Collapse
 
ianturton profile image
Ian Turton

I use pandoc (and go through pdflatex) to convert from markdown to PDF.

Collapse
 
patarapolw profile image
Pacharapol Withayasakpunt

You can -t context, as well, and you might get more features.

You will need to install texlive-full first, though.