DEV Community

Cover image for How I Load Test a PDF Generation API with k6, Docker, and GitHub Actions
Prathamesh Deshmukh
Prathamesh Deshmukh

Posted on

How I Load Test a PDF Generation API with k6, Docker, and GitHub Actions

The Problem with "It Works on My Machine"

PDF generation is one of those deceptively expensive operations. You fire off a request, Puppeteer spins up a headless Chromium, renders a full HTML page, and exports it to bytes. Works great in dev. Works great in staging with one user. Then someone puts it in production and a dozen concurrent requests land at once — and you discover your server is quietly crying.

That was the situation with Templify, a PDF generation platform I built. The core API — POST /convert/{templateId} — compiles a Handlebars template and delegates to a job-runner service (Express + Puppeteer) to render the PDF. Each request is CPU-bound and takes 1–4 seconds depending on template complexity.

Before confidently telling users the API handles concurrent load, I needed proof. Enter k6.


Why k6

I've used Locust, JMeter, and Artillery. k6 wins on developer ergonomics:

  • Test scripts are JavaScript — no YAML configs, no XML, no DSL to learn
  • Built-in thresholds — define pass/fail criteria in the script itself
  • Docker-first — the official grafana/k6 image just works
  • CLI output is readable — colored, structured, and tells you exactly what you need

The only gotcha: k6 uses a custom JS runtime (Goja, not Node.js), so you can't import arbitrary npm packages. For API testing, that limitation never matters.


The Infrastructure (Brief Context)

The job-runner service runs in Docker on a Hetzner CX11 — 2 vCPU, 4GB RAM, ~$4.15/month. It hosts both production (port 3000) and staging (port 3001) environments as separate containers on the same box.

The load test runs on the Hetzner server itself, not from the GitHub Actions runner. This is intentional: it removes network latency variability from CI runners and tests the raw throughput of the server under local loopback — the truest measure of what the hardware can sustain.


Step 1: Write the k6 Test Script

The test script lives at job-runner/k6/load-test.js.

Load Shape

k6 calls the test configuration options. The shape I chose is a classic ramp-up → steady state → ramp-down pattern:

export const options = {
  stages: [
    { duration: '30s', target: 5 },  // ramp up to 5 VUs
    { duration: '1m',  target: 5 },  // hold for 1 minute
    { duration: '30s', target: 0 },  // ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<3500'], // 95th percentile under 3.5s
    http_req_failed:   ['rate<0.1'],   // error rate under 10%
  },
};
Enter fullscreen mode Exit fullscreen mode

Why 5 virtual users? This isn't an arbitrary number. Each VU issues one request, waits 1 second (sleep(1)), then issues another. At 5 concurrent VUs with ~2–3s response times, the server is handling ~2–3 simultaneous Puppeteer renders at any given moment — right at the limit of what a 2-vCPU box can sustain without queuing.

Why p95 instead of average? Averages lie. A test where 94% of requests complete in 200ms and 6% time out at 30 seconds shows a "fine" average. p95 tells you what the worst realistic experience looks like.

The Payload

The test uses a realistic Handlebars template payload — a multi-section marketing brochure for a fictional company called TechInnovate. This matters: testing with {"name": "test"} would render in 300ms; testing with the actual production payload shape catches real performance characteristics.

const PAYLOAD = {
  templateData: {
    hero: {
      title: "Transforming Business Through Technology",
      subtitle: "Innovative solutions that drive growth, efficiency, and competitive advantage",
      ctaButton: { url: "#products", text: "Discover Our Solutions" }
    },
    about: {
      title: "About TechInnovate",
      description: "Founded in 2015, TechInnovate has been at the forefront...",
      bulletPoints: [
        "10+ years of industry experience",
        "200+ successful projects delivered",
        "98% client satisfaction rate",
        "Global team of certified experts"
      ]
    },
    products: { items: [ /* 3 products with prices, descriptions */ ] },
    features: { items: [ /* 4 feature cards */ ] },
    testimonial: {
      quote: "We have seen a 40% increase in efficiency...",
      author: "Jennifer Martinez",
      company: "CEO, Global Enterprises"
    }
    // ... contact, footer, social links
  }
};
Enter fullscreen mode Exit fullscreen mode

The Request Function

const TEMPLATE_ID = "c07deb00-bb22-4e5f-b48e-1b1c17f7c969";
const CLIENT_ID = __ENV.CLIENT_ID;
const CLIENT_SECRET = __ENV.CLIENT_SECRET;
const baseUrl = "https://api.templify.cloud";

export default function () {
  const headers = {
    'Content-Type': 'application/json',
    'client_secret': CLIENT_SECRET,
    'client_id': CLIENT_ID,
  };

  const pdfResponse = http.post(
    `${baseUrl}/convert/${TEMPLATE_ID}`,
    JSON.stringify(PAYLOAD),
    { headers }
  );

  check(pdfResponse, {
    'PDF generation status is 200': (r) => r.status === 200,
    'PDF generation response time < 5s': (r) => r.timings.duration < 5000,
  });

  sleep(1);
}
Enter fullscreen mode Exit fullscreen mode

Credentials come from environment variables via k6's __ENV — never hardcoded. The check() function records pass/fail metrics per assertion without stopping the test.


Step 2: Containerize k6

The Dockerfile is deliberately minimal:

FROM grafana/k6:latest

COPY *.js .

CMD ["run", "load-test.js"]
Enter fullscreen mode Exit fullscreen mode

That's it. The official grafana/k6 image ships k6 at a known version in a minimal Alpine-based image. No node_modules, no build step, no complexity. The *.js glob future-proofs it — add more test files and they're automatically available.


Step 3: The Deploy Script — Running Tests Remotely

The run-load-test.sh script orchestrates the full flow: sync the k6 files to the server, build the Docker image there, run it.

#!/bin/bash
set -e

TARGET_HOST=${HETZNER_HOST}
HETZNER_USER=${HETZNER_USER}
CLIENT_ID=${CLIENT_ID}
CLIENT_SECRET=${CLIENT_SECRET}

echo "Running load test for PRODUCTION environment..."

# Clean previous run
ssh -o StrictHostKeyChecking=no -i ~/.ssh/id_rsa \
  $HETZNER_USER@$TARGET_HOST "rm -rf ~/load-test/k6"

# Sync k6 folder to server
scp -o StrictHostKeyChecking=no -i ~/.ssh/id_rsa \
  -r k6 $HETZNER_USER@$TARGET_HOST:~/load-test/

# Build and run on the server
ssh -o StrictHostKeyChecking=no -i ~/.ssh/id_rsa \
  $HETZNER_USER@$TARGET_HOST << EOF
    cd ~/load-test/k6
    docker stop k6-load-test 2>/dev/null || true
    docker rm k6-load-test 2>/dev/null || true
    docker build --no-cache -f Dockerfile.k6 -t k6-load-test .
    docker run --rm \
      --name k6-load-test \
      -e CLIENT_ID=$CLIENT_ID \
      -e CLIENT_SECRET=$CLIENT_SECRET \
      -e K6_WEB_DASHBOARD=true \
      -e K6_WEB_DASHBOARD_PORT=-1 \
      k6-load-test
EOF

echo "Load test completed."
Enter fullscreen mode Exit fullscreen mode

Why build on the server instead of pulling from a registry?

Because the test script changes frequently during development. Pushing to a registry on every iteration adds friction. scp + docker build --no-cache is fast (< 30 seconds) and guarantees you're running exactly the code you just edited — no cache surprises.

The K6_WEB_DASHBOARD=true flag

k6 ships with a built-in real-time web dashboard. Setting K6_WEB_DASHBOARD_PORT=-1 disables the HTTP server (since we're in a non-interactive SSH session) but still enables the dashboard's internal metrics aggregation and summary report output at the end. If you're running interactively, set a port like 8089 and open the dashboard in your browser during the test run.


Step 4: GitHub Actions Workflow — Manual Trigger

Load tests are not run on every push. Running a 2-minute load test on every PR would be slow, expensive (in credits), and noisy. Instead, it's a workflow_dispatch — triggered manually, on demand:

name: Load Test

on:
  workflow_dispatch:
    inputs:
      environment:
        description: 'Environment to test'
        required: true
        default: 'production'
        type: choice
        options:
          - production
          - staging

jobs:
  load-test:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Setup SSH
        run: |
          mkdir -p ~/.ssh
          echo "${{ secrets.HETZNER_SSH_KEY }}" > ~/.ssh/id_rsa
          chmod 600 ~/.ssh/id_rsa
          ssh-keyscan -H ${{ secrets.HETZNER_HOST }} >> ~/.ssh/known_hosts

      - name: Run load test
        env:
          CLIENT_ID: ${{ secrets.CLIENT_ID }}
          CLIENT_SECRET: ${{ secrets.CLIENT_SECRET }}
          HETZNER_HOST: ${{ secrets.HETZNER_HOST }}
          HETZNER_USER: ${{ secrets.HETZNER_USER }}
        run: |
          chmod +x scripts/run-load-test.sh
          ./scripts/run-load-test.sh ${{ github.event.inputs.environment }}
Enter fullscreen mode Exit fullscreen mode

Required GitHub secrets:
| Secret | Value |
|---|---|
| HETZNER_SSH_KEY | Contents of the ED25519 private key |
| HETZNER_HOST | Server IP or hostname |
| HETZNER_USER | SSH user (e.g. root) |
| CLIENT_ID | Templify API client ID |
| CLIENT_SECRET | Templify API client secret |

The ssh-keyscan step adds the server's host key to known_hosts, preventing the interactive "are you sure?" prompt that would hang the CI runner.


What the Output Looks Like

When k6 finishes, it prints a summary to stdout — which GitHub Actions captures and displays in the workflow logs:

          /\      |‾‾| /‾‾/   /‾‾/
     /\  /  \     |  |/  /   /  /
    /  \/    \    |     (   /   ‾‾\
   /          \   |  |\  \ |  (‾)  |
  / __________ \  |__| \__\ \_____/ .io

  execution: local
     script: load-test.js
     output: -

  scenarios: (100.00%) 1 scenario, 5 max VUs, 2m30s max duration (incl. graceful stop):
           * default: Up to 5 looping VUs for 2m0s over 3 stages (gracefulRampDown: 30s, ...)

✓ PDF generation status is 200
✓ PDF generation response time < 5s

     checks.........................: 100.00% ✓ 142  ✗ 0
     data_received..................: 14 MB   117 kB/s
     data_sent......................: 87 kB   727 B/s
     http_req_blocked...............: avg=18.4ms   min=2µs    med=5µs    max=1.04s    p(90)=10µs   p(95)=14µs
     http_req_duration..............: avg=1.91s    min=892ms  med=1.79s  max=4.12s    p(90)=2.94s  p(95)=3.21s
   ✓ { expected_response:true }....: avg=1.91s    min=892ms  med=1.79s  max=4.12s    p(90)=2.94s  p(95)=3.21s
     http_req_failed................: 0.00%   ✓ 0    ✗ 142
     http_req_receiving.............: avg=143.2ms  min=4.98ms med=79.5ms max=731ms    p(90)=381ms  p(95)=477ms
     http_req_sending...............: avg=258µs    min=97µs   med=213µs  max=1.45ms   p(90)=435µs  p(95)=509µs
     http_req_tls_handshaking.......: avg=18.3ms   min=0s     med=0s     max=1.04s    p(90)=0s     p(95)=0s
     http_req_waiting...............: avg=1.77s    min=858ms  med=1.66s  max=3.86s    p(90)=2.72s  p(95)=3.02s
     http_reqs......................: 142     1.183333/s
     iteration_duration.............: avg=2.91s    min=1.9s   med=2.79s  max=5.15s    p(90)=3.94s  p(95)=4.21s
     iterations.....................: 142     1.183333/s
     vus............................: 1       min=1  max=5
     vus_max........................: 5       min=5  max=5


running (2m00.0s), 0/5 VUs, 142 complete and 0 interrupted iterations
default ✓ [==============================] 0/5 VUs  2m0s

✓ http_req_duration............: p(95)=3.21s < 3.5s  ✓ PASS
✓ http_req_failed..............: rate=0.00% < 10%     ✓ PASS
Enter fullscreen mode Exit fullscreen mode

The thresholds section at the bottom is the pass/fail verdict. k6 exits with a non-zero code if any threshold is breached — which means GitHub Actions marks the workflow run as failed. No manual inspection required.


The Full File Structure

job-runner/
├── k6/
│   ├── load-test.js       # k6 test script
│   └── Dockerfile.k6      # Minimal k6 Docker image
├── scripts/
│   └── run-load-test.sh   # SSH + SCP + docker run orchestration
└── .github/
    └── workflows/
        └── load-test.yml  # Manual GitHub Actions workflow
Enter fullscreen mode Exit fullscreen mode

Four files. That's the entire setup.


Key Design Decisions, Explained

Run k6 on the server, not from CI

Running from GitHub's Ubuntu runners introduces network hops: CI runner → Cloudflare/CDN → Vercel (API gateway) → Hetzner. That's fine for integration testing but adds noise to performance benchmarking. Running on Hetzner itself tests the raw capacity of the PDF service without network jitter.

workflow_dispatch not push

Load tests are deliberately not automated on push. They consume API credits (each PDF generation deducts 1 credit), generate real load, and take 2+ minutes. The right time to run them is before a deploy to production or when investigating a performance regression — not on every feature branch commit.

Credentials via __ENV, never hardcoded

k6's __ENV object reads environment variables passed at runtime. This means the same script works in local dev (k6 run --env CLIENT_ID=xxx load-test.js), in Docker (-e CLIENT_ID=xxx), and in CI — without any code changes.

--no-cache on Docker build

The test script changes often. Docker's layer cache would happily serve a stale load-test.js if you forget to invalidate it. --no-cache is a small penalty (~5 seconds) that guarantees correctness.


Running Locally

If you want to run this without GitHub Actions:

# Install k6 (macOS)
brew install k6

# Run directly
k6 run \
  --env CLIENT_ID=your_client_id \
  --env CLIENT_SECRET=your_client_secret \
  k6/load-test.js

# Or via Docker
docker build -f k6/Dockerfile.k6 -t k6-load-test k6/
docker run --rm \
  -e CLIENT_ID=your_client_id \
  -e CLIENT_SECRET=your_client_secret \
  k6-load-test
Enter fullscreen mode Exit fullscreen mode

To open the live dashboard while running:

k6 run \
  --env CLIENT_ID=xxx \
  --env CLIENT_SECRET=xxx \
  --out web-dashboard=open \
  k6/load-test.js
Enter fullscreen mode Exit fullscreen mode

This opens a browser tab with real-time charts of VU count, request rate, response times, and threshold status.


What I Learned

1. PDF generation doesn't scale linearly. At 1 VU, p95 is ~1.2s. At 5 VUs, p95 climbs to ~3.2s. The bottleneck is Chromium — each instance is single-threaded and memory-hungry. Beyond 5–6 concurrent renders on a 2-vCPU box, response times spike and errors appear.

2. The ramp-up stage matters. Starting at full concurrency immediately causes a thundering herd. Ramping over 30 seconds gives the server time to warm up connection pools and stabilize before the steady-state measurement begins.

3. sleep(1) is realistic pacing. Without a sleep, each VU would hammer the API as fast as possible — useful for finding the absolute breaking point, but not representative of real user behavior. A 1-second pause between requests models a user who just submitted a form and is waiting.

4. Thresholds are commitments. Defining p(95)<3500 in the script makes the performance budget explicit and machine-enforceable. When it breaks, you know exactly why — and you can't ship until it passes.


What's Next

The current setup is a solid baseline. Natural next steps would be:

  • Export metrics to InfluxDB + Grafana for historical trend tracking across deploys
  • Add a spike test stage — a sudden jump to 20 VUs for 10 seconds — to test recovery behavior
  • Test the async endpoint separately — async PDF generation has different characteristics (202 immediate response, webhook delivery latency)
  • Parameterize the template ID and payload to test multiple templates in a single run using k6's SharedArray for test data

But for a $4/month box generating PDFs for real customers, knowing it handles 5 concurrent requests within SLA — proved by automated tests triggered from GitHub — is exactly the confidence level needed to sleep well at night.


Built with k6 by Grafana Labs, deployed on Hetzner Cloud, automated with GitHub Actions.

Top comments (0)