We’ve just published a new Perl School book: Design Patterns in Modern Perl by Mohammad Sajid Anwar.
It’s been a while since we last released a new title, and in the meantime, the world of eBooks has moved on – Amazon don’t use .mobi any more, tools have changed, and my old “it mostly works if you squint” build pipeline was starting to creak.
On top of that, we had a hard deadline: we wanted the book ready in time for the London Perl Workshop. As the date loomed, last-minute fixes and manual tweaks became more and more terrifying. We really needed a reliable, reproducible way to go from manuscript to “good quality PDF + EPUB” every time.
So over the last couple of weeks, I’ve been rebuilding the Perl School book pipeline from the ground up. This post is the story of that process, the tools I ended up using, and how you can steal it for your own books.
The old world, and why it wasn’t good enough
The original Perl School pipeline dates back to a very different era:
Amazon wanted
.mobifiles.EPUB support was patchy.
I was happy to glue things together with shell scripts and hope for the best.
It worked… until it didn’t. Each book had slightly different scripts, slightly different assumptions, and a slightly different set of last-minute manual tweaks. It certainly wasn’t something I’d hand to a new author and say, “trust this”.
Coming back to it for Design Patterns in Modern Perl made that painfully obvious. The book itself is modern and well-structured; the pipeline that produced it shouldn’t feel like a relic.
Choosing tools: Pandoc and wkhtmltopdf (and no LaTeX, thanks)
The new pipeline is built around two main tools:
Pandoc – the Swiss Army knife of document conversion. It can take Markdown/Markua plus metadata and produce HTML, EPUB, and much, much more.
wkhtmltopdf– which turns HTML into a print-ready PDF using a headless browser engine.
Why not LaTeX? Because I’m allergic. LaTeX is enormously powerful, but every time I’ve tried to use it seriously, I end up debugging page breaks in a language I don’t enjoy. HTML + CSS I can live with; browsers I can reason about. So the PDF route is:
- Markdown → HTML (via Pandoc) → PDF (via
wkhtmltopdf)
And the EPUB route is:
- Markdown → EPUB (via Pandoc) → validated with
epubcheck
The front matter (cover page, title page, copyright, etc.) is generated with Template Toolkit from a simple book-metadata.yml file, and then stitched together with the chapters to produce a nice, consistent book.
That got us a long way… but then a reader found a bug.
The iBooks bug report
Shortly after publication, I got an email from a reader who’d bought the Leanpub EPUB and was reading it in Apple Books (iBooks). Instead of happily flipping through Design Patterns in Modern Perl, they were greeted with a big pink error box.
Apple’s error message boiled down to:
There’s something wrong with the XHTML in this EPUB.
That was slightly worrying. But, hey, every day is a learning opportunity. And, after a bit of digging, this is what I found out.
EPUB 3 files are essentially a ZIP containing:
XHTML content files
a bit of XML metadata
CSS, images, and so on
Apple Books is quite strict about the “X” in XHTML: it expects well-formed XML, not just “kind of valid HTML”. So when working with EPUB, you need to forget all of that nice HTML5 flexibility that you’ve got used to over the last decade or so.
The first job was to see if we could reproduce the error and work out where it was coming from.
Discovering epubcheck
Enter epubcheck.
epubcheck is the reference validator for EPUB files. Point it at an .epub and it will unpack it, parse all the XML/XHTML, check the metadata and manifest, and tell you exactly what’s wrong.
Running it on the book immediately produced this:
Fatal Error while parsing file: The element type
brmust be terminated by the matching end-tag</br>.
That’s the XML parser’s way of saying:
In HTML,
<br>is fine.In XHTML (which is XML), you must use
<br />(self-closing) or<br></br>.
And there were a number of these scattered across a few chapters.
In other words: perfectly reasonable raw HTML in the manuscript had been passed straight through by Pandoc into the EPUB, but that HTML was not strictly valid XHTML, so Apple Books rejected it. I should note at this point that the documentation for EPUB explicitly says that it won’t touch HTML fragments it finds in a Markdown file when converting it to EPUB. It’s down to the author to ensure they’re using valid XHTML
A quick (but not scalable) fix
Under time pressure, the quickest way to confirm the diagnosis was:
Unzip the generated EPUB.
Open the offending XHTML file.
Manually turn
<br>into<br />in a couple of places.Re-zip the EPUB.
Run
epubcheckagain.Try it in Apple Books.
That worked. The errors vanished, epubcheck was happy, and the reader confirmed that the fixed file opened fine in iBooks.
But clearly:
Open the EPUB in a text editor and fix the XHTML by hand
is not a sustainable publishing strategy.
So the next step was to move from “hacky manual fix” to “the pipeline prevents this from happening again”.
HTML vs XHTML, and why linters matter
The underlying issue is straightforward once you remember it:
HTML is very forgiving. Browsers will happily fix up all kinds of broken markup.
XHTML is XML, so it’s not forgiving:
EPUB 3 content files are XHTML. If you feed them sloppy HTML, some readers (like Apple Books) will just refuse to load the chapter.
So I added a manuscript HTML linter to the toolchain, before we ever get to Pandoc or epubcheck.
Roughly, the linter:
Reads the manuscript (ignoring fenced code blocks so it doesn’t complain about
<in Perl examples).Extracts any raw HTML chunks.
Wraps those chunks in a temporary root element.
Uses
XML::LibXMLto check they’re well-formed XML.Reports any errors with file and line number.
It’s not trying to be a full HTML validator; it’s just checking: “If this HTML ends up in an EPUB, will the XML parser choke?”
That would have caught the <br> problem before the book ever left my machine.
Hardening the pipeline: epubcheck in the loop
The linter catches the obvious issues in the manuscript; epubcheck is still the final authority on the finished EPUB.
So the pipeline now looks like this:
Lint the manuscript HTML
Catch broken raw HTML/XHTML before conversion.Build PDF + EPUB via
make_bookRun
epubcheckon the EPUB
Ensure the final file is standards-compliant.Only then do we upload it to Leanpub and Amazon, making it available to eager readers.
The nice side-effect of this is that any future changes (new CSS, new template, different metadata) still go through the same gauntlet. If something breaks, the pipeline shouts at me long before a reader has to.
Docker and GitHub Actions: making it reproducible
Having a nice Perl script and a list of tools installed on my laptop is fine for a solo project; it’s not great if:
other authors might want to build their own drafts, or
I want the build to happen automatically in CI.
So the next step was to package everything into a Docker image and wire it into GitHub Actions.
The Docker image is based on a slim Ubuntu and includes:
Perl +
cpanm+ all CPAN modules from the repo’scpanfilepandocwkhtmltopdfJava +
epubcheckThe Perl School utility scripts themselves (
make_book,check_ms_html, etc.)
The workflow in a book repo is simple:
Mount the book’s Git repo into
/work.Run
check_ms_htmlto lint the manuscript.Run
make_bookto buildbuilt/*.pdfandbuilt/*.epub.Run
epubcheckon the EPUB.Upload the
built/artefacts.
GitHub Actions then uses that same image as a container for the job, so every push or pull request can build the book in a clean, consistent environment, without needing each author to install Pandoc, wkhtmltopdf, Java, and a large chunk of CPAN locally.
Why I’m making this public
At this point, the pipeline feels:
modern (Pandoc, HTML/CSS layout, EPUB 3),
robust (lint +
epubcheck),reproducible (Docker + Actions),
and not tied to Perl in any deep way.
Yes, Design Patterns in Modern Perl is a Perl book, and the utilities live under the “Perl School” banner, but nothing is stopping you from using the same setup for your own book on whatever topic you care about.
So I’ve made the utilities available in a public repository (the perlschool-util repo on GitHub). There you’ll find:
the build scripts,
the Dockerfile and helper script,
example GitHub Actions configuration,
and notes on how to structure a book repo.
If you’ve ever thought:
I’d like to write a small technical book, but I don’t want to fight with LaTeX or invent a build system from scratch…
then you’re very much the person I had in mind.
eBook publishing really is pretty easy once you’ve got a solid pipeline. If these tools help you get your ideas out into the world, that’s a win.
And, of course, if you’d like to write a book for Perl School, I’m still very interested in talking to potential authors – especially if you’re doing interesting modern Perl in the real world.
The post Behind the scenes at Perl School Publishing first appeared on Perl Hacks.
Top comments (0)