SEN LLC

Posted on Apr 15

PHP as a Microservice: A Markdown-to-HTML API in 200 Lines of Slim 4

#php #slim #markdown #tutorial

PHP as a Microservice: A Markdown-to-HTML API in 200 Lines of Slim 4

A tiny HTTP service that renders Markdown to HTML. PHP 8.2, Slim 4, league/commonmark. Safe-by-default, GFM-capable, with heading extraction thrown in as a freebie. Under 200 lines of application code, 52 MB Docker image, PHP's built-in web server as the runtime.

📦 GitHub: https://github.com/sen-ltd/markdown-api

Every team I've worked with that has any content — docs, forum posts, AI chat rendering, CMS previews, PR comments, release notes — eventually embeds a Markdown library directly into every app that needs it. The Laravel app has its copy. The Node preview service has its copy. The internal wiki has its copy. Three months later, they've diverged: GFM tables here, not there; safe mode on here, off there; different anchor slugs in each.

There is a cleaner separation: a tiny HTTP service with one endpoint. Give it Markdown, get HTML back. One library, one set of rules, one place to upgrade when there's a CommonMark spec change.

This article is about building that service in PHP. Not because PHP is fashionable for microservices — it isn't — but because when your production stack is already PHP, spinning up a sidecar renderer in anything else is needlessly annoying. You want the same language, the same composer install, the same deployment pipeline. The total cost of "add a new microservice" should be zero incremental tooling.

The constraint: PHP, but not Laravel

I wanted this service to feel like a Go binary in shape: install nothing extra, copy a folder, run a single process, watch it serve HTTP. The temptation, working in PHP, is to reach for Laravel by reflex. Laravel is excellent for applications. It is massive overkill for a single-endpoint renderer: you'll pull in Eloquent, the service container, facades, the full HTTP kernel, route caching, and about 30 MB of vendor dependencies. None of that is doing anything for a service whose entire job fits on one screen.

So this is built on Slim 4, which is the minimal PSR-7 glue that exists for exactly this case. You get routing, middleware, request/response objects, and nothing else. No ORM. No config system. No container you need to learn. Three files do the entire thing:

src/
├── Renderer.php              # league/commonmark wrapper, pure
├── HeadingExtractor.php      # extracts <h1>..<h6> from rendered HTML
└── Middleware/
    └── JsonRequestLogger.php # one JSON log line per request
public/
└── index.php                 # app factory + four routes

Composer dependencies are slim/slim, slim/psr7, and league/commonmark. In production that pulls down maybe 20 MB, which is enough to render everything CommonMark plus the full GitHub Flavored Markdown spec. No other runtime dependencies.

The renderer

league/commonmark is the PHP CommonMark-compliant reference implementation. You'll see erusev/parsedown or cebe/markdown in older PHP codebases, but league/commonmark is the choice now because (a) it tracks the CommonMark spec precisely, and (b) its GFM support is a first-class extension, not a bolt-on regex patch. Tables, task lists, strikethrough, and bare-URL autolinks are one addExtension() call away.

Here's the whole renderer, minus the imports:

final class Renderer
{
    public const FLAVOR_COMMONMARK = 'commonmark';
    public const FLAVOR_GFM        = 'gfm';

    private ConverterInterface $converter;

    public function __construct(string $flavor = self::FLAVOR_GFM, bool $safe = true)
    {
        $config = [
            'html_input'         => $safe ? 'escape' : 'allow',
            'allow_unsafe_links' => !$safe,
        ];

        $env = new Environment($config);
        $env->addExtension(new CommonMarkCoreExtension());

        if ($flavor === self::FLAVOR_GFM) {
            $env->addExtension(new GithubFlavoredMarkdownExtension());
        }

        $this->converter = new MarkdownConverter($env);
    }

    public function render(string $markdown): string
    {
        return $this->converter->convert($markdown)->getContent();
    }
}

Two things worth pointing out:

Safe mode is a constructor flag, not a render-time flag. You choose once, when you build the Renderer, and the converter is then immutable for the rest of its life. This matters because a request handler that juggles per-request renderer configuration is a great place for subtle bugs. If you want both modes, you build two Renderer instances.

html_input => escape plus allow_unsafe_links => false is the safe default. It is unusually easy to get XSS via raw HTML embedded in Markdown. The CommonMark spec says raw HTML blocks are legal; a naive renderer passes them through verbatim, meaning <script>alert(1)</script> in user-submitted Markdown becomes executing JavaScript in whoever's browser views that post. With html_input => escape, that same input renders as <script>alert(1)</script> — visible as text, not executable. This is the behavior you want 95% of the time, and this service makes it the default.

The tradeoff: if you do want to allow raw HTML (trusted authors writing internal docs, for instance), you have to pass "safe": false explicitly. You can't forget it on by accident.

The Slim route

Slim 4 gives you routes as closures or PSR-15 handlers. For something this small, closures are cleaner — you see the whole pipeline in one place. The core render route is:

$app->post('/render', function (Request $request, Response $response) use ($maxLen) {
    $body     = $request->getParsedBody() ?: [];
    $markdown = (string) ($body['markdown'] ?? '');
    $flavor   = (string) ($body['flavor'] ?? 'gfm');
    $safe     = (bool) ($body['safe'] ?? true);

    if ($markdown === '') {
        return json($response, ['error' => 'empty'], 422);
    }
    if (strlen($markdown) > $maxLen) {
        return json($response, ['error' => 'too_large', 'limit' => $maxLen], 413);
    }

    $renderer = new Renderer($flavor, $safe);
    $html     = $renderer->render($markdown);
    $headings = HeadingExtractor::extract($html);
    $words    = Renderer::wordCount($markdown);

    return json($response, [
        'html'       => $html,
        'word_count' => $words,
        'headings'   => $headings,
    ]);
});

Everything the handler does is visible in those 15 lines. Nothing is hidden behind DI, middleware, or an interceptor. If you find a bug, the fix is in a place you can point to with one finger. For a service whose job is "render markdown," this is the right level of abstraction — anything more framework-y is a tax you pay forever for expressiveness you never use.

The service exposes four routes:

POST /render — JSON in, JSON out (HTML + word count + headings).
POST /render/html — same input, but returns raw HTML with Content-Type: text/html. No JSON envelope. This one exists because the most common caller pattern is fetch('/render/html').then(r => r.text()).then(html => el.innerHTML = html), and forcing everyone to unwrap a JSON object first is silly.
GET /render?text=... — a GET version for short inputs you can curl by hand when debugging.
GET /health — liveness + version probe, returns the league/commonmark version so you can verify deploys.

Plus a GET / that serves a minimal demo page — a textarea on the left, live rendered HTML on the right, connected by a single fetch() call. That page is nothing fancy, but it's an incredibly effective smoke test for "is the service actually up and rendering," and it doubles as documentation you can hand to a skeptical colleague. Give them the URL, they paste their Markdown in, they see it render: conversation over.

Heading extraction as a free side dish

Here's the payload bonus that I didn't anticipate finding useful, but that every caller ended up using: an extracted list of headings, with levels and anchor slugs.

final class HeadingExtractor
{
    public static function extract(string $html): array
    {
        $out = [];
        if (!preg_match_all('/<h([1-6])(?:\s[^>]*)?>(.*?)<\/h\1>/is', $html, $m, PREG_SET_ORDER)) {
            return [];
        }

        $seen = [];
        foreach ($m as $match) {
            $level = (int) $match[1];
            $text  = self::plain($match[2]);
            if ($text === '') continue;

            $anchor = self::slugify($text);
            $base = $anchor; $counter = 1;
            while (isset($seen[$anchor])) {
                $anchor = "{$base}-{$counter}";
                $counter++;
            }
            $seen[$anchor] = true;

            $out[] = ['level' => $level, 'text' => $text, 'anchor' => $anchor];
        }
        return $out;
    }
}

Three design notes here:

It runs on the rendered HTML, not the Markdown source. Setext headings (Foo\n===), HTML blocks, and anything else that ends up as an <h1>..<h6> all get picked up uniformly. If we parsed the Markdown source, we'd need a second tokenizer and it would disagree with the rendered output on edge cases.

Duplicate anchors get -1, -2, ... suffixes. GitHub, Docusaurus, MkDocs, and just about every TOC generator behaves this way. If a user writes two "Setup" headings in the same document, the second one becomes #setup-1. Skipping this detail is where TOC tables collide and the second link silently jumps to the wrong section.

The slugifier preserves Unicode. # Café ☕ becomes café (the emoji drops out because it's not a letter or a digit, but the é stays). If you replace \p{L}\p{N} with a-zA-Z0-9 you'll mangle every non-English doc, and you won't find out until someone on your team writes a Japanese-language README.

The payload shape {level, text, anchor} is the minimum viable TOC data: caller can render a <ul><li><a href="#anchor">text</a> list directly, or build a nested tree if they want. Every downstream consumer had this need within two hours of seeing the service go live. It would have been annoying to add it later.

Tests: exercise the real Slim app, not a mock

PHPUnit tests usually mock the request/response, and usually regret it. Slim 4 lets you build a real ServerRequest, hand it to App::handle(), and assert on the real response — no network, no server process. That is the right level to test at, because it catches routing bugs, middleware ordering bugs, and content-type bugs that a unit test on Renderer::render() would miss.

public function testSafeModeEscapesByDefault(): void
{
    $req = (new ServerRequestFactory())->createServerRequest('POST', '/render');
    $req = $req->withBody(streamOf('{"markdown":"<script>alert(1)</script>"}'))
               ->withHeader('Content-Type', 'application/json');

    $res = self::$app->handle($req);
    $this->assertSame(200, $res->getStatusCode());

    $data = json_decode((string) $res->getBody(), true);
    $this->assertStringNotContainsString('<script>', $data['html']);
    $this->assertStringContainsString('&lt;script&gt;', $data['html']);
}

The 29-test suite covers: plain CommonMark, GFM tables, task lists, strikethrough, autolinks, safe mode escaping, unsafe mode allowing raw HTML, heading extraction with duplicates and Unicode, word count, empty-input 422, oversize 413, invalid flavor 400, each of the four routes, and the JSON-logging middleware in isolation. All of it runs in about 50 ms.

Tradeoffs and where I wouldn't use this

Three real tradeoffs to acknowledge:

Unsafe mode is genuinely dangerous. If you ever set "safe": false on user-submitted content, you have shipped an XSS vulnerability. The safe default is a mitigation, not a guarantee. Reviewing your caller code for stray "safe": false passes should be part of your PR checklist. For the public web, I'd argue you should delete the unsafe-mode branch from your deploy entirely.

PHP's built-in web server isn't preforking. The php -S command that this image uses is a single-threaded dev server — requests run sequentially in the same process. For a side service rendering a few requests per second behind a CMS preview route that's fine, and the operational simplicity (no nginx, no FPM socket tuning, no .htaccess) is worth a lot. For anything hotter than that, put a reverse proxy in front with request-level caching (most Markdown inputs are stable within a request-response cycle), or switch to PHP-FPM with nginx upstream. There's no code change needed for either — the Slim app runs unchanged under any SAPI.

GFM is not the only flavor. If you need AsciiDoc, reStructuredText, or Pandoc-style academic extensions, this isn't the service. Pandoc's feature set is in another league, but it's also a different operational shape — it's a CLI, not a library, and you end up shelling out per request. For 95% of the "render user content" use cases in web apps, GFM is enough, and league/commonmark is the right choice.

Try it in 30 seconds

docker build -t markdown-api .
docker run --rm -p 8000:8000 markdown-api

Then:

curl -X POST http://localhost:8000/render \
  -H "Content-Type: application/json" \
  -d '{"markdown": "# Hello\n\nThis is **bold**.\n\n- [x] done\n- [ ] todo"}'

{
  "html": "<h1>Hello</h1>\n<p>This is <strong>bold</strong>.</p>\n<ul>\n<li><input checked disabled type=\"checkbox\"> done</li>\n<li><input disabled type=\"checkbox\"> todo</li>\n</ul>\n",
  "word_count": 6,
  "headings": [{"level": 1, "text": "Hello", "anchor": "hello"}]
}

Or curl -sS http://localhost:8000/health for a liveness probe, or open http://localhost:8000/ in a browser for the live-preview demo.

The full source, Dockerfile, test suite, and OpenAPI spec are on GitHub. If you're a PHP team with markdown rendering scattered across three services, this is the shape of thing I'd recommend extracting — not because extracting services is always the answer, but because markdown rendering in particular benefits disproportionately from living in one place you can upgrade atomically.

📦 GitHub: https://github.com/sen-ltd/markdown-api

This is post #133 in SEN's 100+ portfolio project series. We build small, focused, shippable dev tools in every language and framework we can, and write up the design decisions each time. If you'd like us to build something for your team, we're for hire.