loading...

A reliable way to create PDF from HTML/markdown, with PDF specific features

patarapolw profile image Pacharapol Withayasakpunt ・1 min read

Indeed, the way includes

  • Don't just simply convert a HTML file to PDF, one-to-one. Otherwise, you can never control page breaks.
  • Nonetheless, HTML rendering will be web-browser dependent. (Therefore, not sure about Pandoc.)
  • CSS is powerful, but are there exceptions?

Therefore, I suggest a way of using a web driver + a PDF library, that can READ and MODIFY pdf.

The web driver is currently best either Puppeteer, or Chrome DevTools Protocol.

Additionally, it might be possible to distribute PDF generator via Electron + Puppeteer-in-Electron.

How to use puppeteer-core with electron?

3

I got this code from another Stackoverflow Question:

import electron from "electron";
import puppeteer from "puppeteer-core";

const delay = (ms: number) =>
  new Promise(resolve => {
    setTimeout(() => {
      resolve();
    }, ms);
  });

(async () => {
  try {
    const app = await puppeteer.launch({
      executablePath: electron,
      args: ["."],
      headless: false,
…

The PDF manager, that can read-and-merge PDF, is traditionally either PDFtk (binary) or pdfbox (Java), I think; but I have just recently found,

GitHub logo Hopding / pdf-lib

Create and modify PDF documents in any JavaScript environment

About CSS, yes CSS can also detect page margins.

  body {
    position: fixed;
    width: 100vw;
    height: 100vh;
    display: flex;
    align-items: center;
    justify-content: center;
  }

This is my attempt so far.

GitHub logo patarapolw / make-pdf

Beautifully make a pdf from couples of image files

So, the answer to the question is, no, do not convert a single HTML or Markdown file, to one PDF file; but do combine within a folder. Also,

  • Running a web server might be better than using file:// protocol and relative paths
  • Choosing a web browser might affect result.

Also, consider alternatives to PDF, that easily allow editing. Might be odt or docx?

Posted on by:

patarapolw profile

Pacharapol Withayasakpunt

@patarapolw

Currently interested in TypeScript, Vue, Kotlin and Python. Looking forward to learning DevOps, though.

Discussion

pic
Editor guide
 

Our team has been working on a document generation project and we convert HTML to PDF using wkhtmltopdf. For example we generate documents like this using only HTML and CSS. wkhtmltopdf has a great CSS support. Regarding page breaks we can control them using page-break-before and page-break-after properties.

As for alternatives, recently we started to use docx templates and process them with docxtemplater and convert to PDF with libreoffice headless.

 

Apparently, I find that pandoc alone can be powerful enough.

New page is as easy as \newpage. (I know, LaTeX syntax in Markdown.)

Also, geometry: margin=1cm in YAML frontmatter.

Also, LaTeX can be used to host and join PDF.

But, is there a best tool that can easily do all these?

BTW, I found Puppeteer unreliable.