The Problem with "It Works on My Machine"
PDF generation is one of those deceptively expensive operations. You fire off a request, Puppeteer spins up a headless Chromium, renders a full HTML page, and exports it to bytes. Works great in dev. Works great in staging with one user. Then someone puts it in production and a dozen concurrent requests land at once — and you discover your server is quietly crying.
That was the situation with Templify, a PDF generation platform I built. The core API — POST /convert/{templateId} — compiles a Handlebars template and delegates to a job-runner service (Express + Puppeteer) to render the PDF. Each request is CPU-bound and takes 1–4 seconds depending on template complexity.
Before confidently telling users the API handles concurrent load, I needed proof. Enter k6.
Why k6
I've used Locust, JMeter, and Artillery. k6 wins on developer ergonomics:
- Test scripts are JavaScript — no YAML configs, no XML, no DSL to learn
- Built-in thresholds — define pass/fail criteria in the script itself
-
Docker-first — the official
grafana/k6image just works - CLI output is readable — colored, structured, and tells you exactly what you need
The only gotcha: k6 uses a custom JS runtime (Goja, not Node.js), so you can't import arbitrary npm packages. For API testing, that limitation never matters.
The Infrastructure (Brief Context)
The job-runner service runs in Docker on a Hetzner CX11 — 2 vCPU, 4GB RAM, ~$4.15/month. It hosts both production (port 3000) and staging (port 3001) environments as separate containers on the same box.
The load test runs on the Hetzner server itself, not from the GitHub Actions runner. This is intentional: it removes network latency variability from CI runners and tests the raw throughput of the server under local loopback — the truest measure of what the hardware can sustain.
Step 1: Write the k6 Test Script
The test script lives at job-runner/k6/load-test.js.
Load Shape
k6 calls the test configuration options. The shape I chose is a classic ramp-up → steady state → ramp-down pattern:
export const options = {
stages: [
{ duration: '30s', target: 5 }, // ramp up to 5 VUs
{ duration: '1m', target: 5 }, // hold for 1 minute
{ duration: '30s', target: 0 }, // ramp down
],
thresholds: {
http_req_duration: ['p(95)<3500'], // 95th percentile under 3.5s
http_req_failed: ['rate<0.1'], // error rate under 10%
},
};
Why 5 virtual users? This isn't an arbitrary number. Each VU issues one request, waits 1 second (sleep(1)), then issues another. At 5 concurrent VUs with ~2–3s response times, the server is handling ~2–3 simultaneous Puppeteer renders at any given moment — right at the limit of what a 2-vCPU box can sustain without queuing.
Why p95 instead of average? Averages lie. A test where 94% of requests complete in 200ms and 6% time out at 30 seconds shows a "fine" average. p95 tells you what the worst realistic experience looks like.
The Payload
The test uses a realistic Handlebars template payload — a multi-section marketing brochure for a fictional company called TechInnovate. This matters: testing with {"name": "test"} would render in 300ms; testing with the actual production payload shape catches real performance characteristics.
const PAYLOAD = {
templateData: {
hero: {
title: "Transforming Business Through Technology",
subtitle: "Innovative solutions that drive growth, efficiency, and competitive advantage",
ctaButton: { url: "#products", text: "Discover Our Solutions" }
},
about: {
title: "About TechInnovate",
description: "Founded in 2015, TechInnovate has been at the forefront...",
bulletPoints: [
"10+ years of industry experience",
"200+ successful projects delivered",
"98% client satisfaction rate",
"Global team of certified experts"
]
},
products: { items: [ /* 3 products with prices, descriptions */ ] },
features: { items: [ /* 4 feature cards */ ] },
testimonial: {
quote: "We have seen a 40% increase in efficiency...",
author: "Jennifer Martinez",
company: "CEO, Global Enterprises"
}
// ... contact, footer, social links
}
};
The Request Function
const TEMPLATE_ID = "c07deb00-bb22-4e5f-b48e-1b1c17f7c969";
const CLIENT_ID = __ENV.CLIENT_ID;
const CLIENT_SECRET = __ENV.CLIENT_SECRET;
const baseUrl = "https://api.templify.cloud";
export default function () {
const headers = {
'Content-Type': 'application/json',
'client_secret': CLIENT_SECRET,
'client_id': CLIENT_ID,
};
const pdfResponse = http.post(
`${baseUrl}/convert/${TEMPLATE_ID}`,
JSON.stringify(PAYLOAD),
{ headers }
);
check(pdfResponse, {
'PDF generation status is 200': (r) => r.status === 200,
'PDF generation response time < 5s': (r) => r.timings.duration < 5000,
});
sleep(1);
}
Credentials come from environment variables via k6's __ENV — never hardcoded. The check() function records pass/fail metrics per assertion without stopping the test.
Step 2: Containerize k6
The Dockerfile is deliberately minimal:
FROM grafana/k6:latest
COPY *.js .
CMD ["run", "load-test.js"]
That's it. The official grafana/k6 image ships k6 at a known version in a minimal Alpine-based image. No node_modules, no build step, no complexity. The *.js glob future-proofs it — add more test files and they're automatically available.
Step 3: The Deploy Script — Running Tests Remotely
The run-load-test.sh script orchestrates the full flow: sync the k6 files to the server, build the Docker image there, run it.
#!/bin/bash
set -e
TARGET_HOST=${HETZNER_HOST}
HETZNER_USER=${HETZNER_USER}
CLIENT_ID=${CLIENT_ID}
CLIENT_SECRET=${CLIENT_SECRET}
echo "Running load test for PRODUCTION environment..."
# Clean previous run
ssh -o StrictHostKeyChecking=no -i ~/.ssh/id_rsa \
$HETZNER_USER@$TARGET_HOST "rm -rf ~/load-test/k6"
# Sync k6 folder to server
scp -o StrictHostKeyChecking=no -i ~/.ssh/id_rsa \
-r k6 $HETZNER_USER@$TARGET_HOST:~/load-test/
# Build and run on the server
ssh -o StrictHostKeyChecking=no -i ~/.ssh/id_rsa \
$HETZNER_USER@$TARGET_HOST << EOF
cd ~/load-test/k6
docker stop k6-load-test 2>/dev/null || true
docker rm k6-load-test 2>/dev/null || true
docker build --no-cache -f Dockerfile.k6 -t k6-load-test .
docker run --rm \
--name k6-load-test \
-e CLIENT_ID=$CLIENT_ID \
-e CLIENT_SECRET=$CLIENT_SECRET \
-e K6_WEB_DASHBOARD=true \
-e K6_WEB_DASHBOARD_PORT=-1 \
k6-load-test
EOF
echo "Load test completed."
Why build on the server instead of pulling from a registry?
Because the test script changes frequently during development. Pushing to a registry on every iteration adds friction. scp + docker build --no-cache is fast (< 30 seconds) and guarantees you're running exactly the code you just edited — no cache surprises.
The K6_WEB_DASHBOARD=true flag
k6 ships with a built-in real-time web dashboard. Setting K6_WEB_DASHBOARD_PORT=-1 disables the HTTP server (since we're in a non-interactive SSH session) but still enables the dashboard's internal metrics aggregation and summary report output at the end. If you're running interactively, set a port like 8089 and open the dashboard in your browser during the test run.
Step 4: GitHub Actions Workflow — Manual Trigger
Load tests are not run on every push. Running a 2-minute load test on every PR would be slow, expensive (in credits), and noisy. Instead, it's a workflow_dispatch — triggered manually, on demand:
name: Load Test
on:
workflow_dispatch:
inputs:
environment:
description: 'Environment to test'
required: true
default: 'production'
type: choice
options:
- production
- staging
jobs:
load-test:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Setup SSH
run: |
mkdir -p ~/.ssh
echo "${{ secrets.HETZNER_SSH_KEY }}" > ~/.ssh/id_rsa
chmod 600 ~/.ssh/id_rsa
ssh-keyscan -H ${{ secrets.HETZNER_HOST }} >> ~/.ssh/known_hosts
- name: Run load test
env:
CLIENT_ID: ${{ secrets.CLIENT_ID }}
CLIENT_SECRET: ${{ secrets.CLIENT_SECRET }}
HETZNER_HOST: ${{ secrets.HETZNER_HOST }}
HETZNER_USER: ${{ secrets.HETZNER_USER }}
run: |
chmod +x scripts/run-load-test.sh
./scripts/run-load-test.sh ${{ github.event.inputs.environment }}
Required GitHub secrets:
| Secret | Value |
|---|---|
| HETZNER_SSH_KEY | Contents of the ED25519 private key |
| HETZNER_HOST | Server IP or hostname |
| HETZNER_USER | SSH user (e.g. root) |
| CLIENT_ID | Templify API client ID |
| CLIENT_SECRET | Templify API client secret |
The ssh-keyscan step adds the server's host key to known_hosts, preventing the interactive "are you sure?" prompt that would hang the CI runner.
What the Output Looks Like
When k6 finishes, it prints a summary to stdout — which GitHub Actions captures and displays in the workflow logs:
/\ |‾‾| /‾‾/ /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: load-test.js
output: -
scenarios: (100.00%) 1 scenario, 5 max VUs, 2m30s max duration (incl. graceful stop):
* default: Up to 5 looping VUs for 2m0s over 3 stages (gracefulRampDown: 30s, ...)
✓ PDF generation status is 200
✓ PDF generation response time < 5s
checks.........................: 100.00% ✓ 142 ✗ 0
data_received..................: 14 MB 117 kB/s
data_sent......................: 87 kB 727 B/s
http_req_blocked...............: avg=18.4ms min=2µs med=5µs max=1.04s p(90)=10µs p(95)=14µs
http_req_duration..............: avg=1.91s min=892ms med=1.79s max=4.12s p(90)=2.94s p(95)=3.21s
✓ { expected_response:true }....: avg=1.91s min=892ms med=1.79s max=4.12s p(90)=2.94s p(95)=3.21s
http_req_failed................: 0.00% ✓ 0 ✗ 142
http_req_receiving.............: avg=143.2ms min=4.98ms med=79.5ms max=731ms p(90)=381ms p(95)=477ms
http_req_sending...............: avg=258µs min=97µs med=213µs max=1.45ms p(90)=435µs p(95)=509µs
http_req_tls_handshaking.......: avg=18.3ms min=0s med=0s max=1.04s p(90)=0s p(95)=0s
http_req_waiting...............: avg=1.77s min=858ms med=1.66s max=3.86s p(90)=2.72s p(95)=3.02s
http_reqs......................: 142 1.183333/s
iteration_duration.............: avg=2.91s min=1.9s med=2.79s max=5.15s p(90)=3.94s p(95)=4.21s
iterations.....................: 142 1.183333/s
vus............................: 1 min=1 max=5
vus_max........................: 5 min=5 max=5
running (2m00.0s), 0/5 VUs, 142 complete and 0 interrupted iterations
default ✓ [==============================] 0/5 VUs 2m0s
✓ http_req_duration............: p(95)=3.21s < 3.5s ✓ PASS
✓ http_req_failed..............: rate=0.00% < 10% ✓ PASS
The thresholds section at the bottom is the pass/fail verdict. k6 exits with a non-zero code if any threshold is breached — which means GitHub Actions marks the workflow run as failed. No manual inspection required.
The Full File Structure
job-runner/
├── k6/
│ ├── load-test.js # k6 test script
│ └── Dockerfile.k6 # Minimal k6 Docker image
├── scripts/
│ └── run-load-test.sh # SSH + SCP + docker run orchestration
└── .github/
└── workflows/
└── load-test.yml # Manual GitHub Actions workflow
Four files. That's the entire setup.
Key Design Decisions, Explained
Run k6 on the server, not from CI
Running from GitHub's Ubuntu runners introduces network hops: CI runner → Cloudflare/CDN → Vercel (API gateway) → Hetzner. That's fine for integration testing but adds noise to performance benchmarking. Running on Hetzner itself tests the raw capacity of the PDF service without network jitter.
workflow_dispatch not push
Load tests are deliberately not automated on push. They consume API credits (each PDF generation deducts 1 credit), generate real load, and take 2+ minutes. The right time to run them is before a deploy to production or when investigating a performance regression — not on every feature branch commit.
Credentials via __ENV, never hardcoded
k6's __ENV object reads environment variables passed at runtime. This means the same script works in local dev (k6 run --env CLIENT_ID=xxx load-test.js), in Docker (-e CLIENT_ID=xxx), and in CI — without any code changes.
--no-cache on Docker build
The test script changes often. Docker's layer cache would happily serve a stale load-test.js if you forget to invalidate it. --no-cache is a small penalty (~5 seconds) that guarantees correctness.
Running Locally
If you want to run this without GitHub Actions:
# Install k6 (macOS)
brew install k6
# Run directly
k6 run \
--env CLIENT_ID=your_client_id \
--env CLIENT_SECRET=your_client_secret \
k6/load-test.js
# Or via Docker
docker build -f k6/Dockerfile.k6 -t k6-load-test k6/
docker run --rm \
-e CLIENT_ID=your_client_id \
-e CLIENT_SECRET=your_client_secret \
k6-load-test
To open the live dashboard while running:
k6 run \
--env CLIENT_ID=xxx \
--env CLIENT_SECRET=xxx \
--out web-dashboard=open \
k6/load-test.js
This opens a browser tab with real-time charts of VU count, request rate, response times, and threshold status.
What I Learned
1. PDF generation doesn't scale linearly. At 1 VU, p95 is ~1.2s. At 5 VUs, p95 climbs to ~3.2s. The bottleneck is Chromium — each instance is single-threaded and memory-hungry. Beyond 5–6 concurrent renders on a 2-vCPU box, response times spike and errors appear.
2. The ramp-up stage matters. Starting at full concurrency immediately causes a thundering herd. Ramping over 30 seconds gives the server time to warm up connection pools and stabilize before the steady-state measurement begins.
3. sleep(1) is realistic pacing. Without a sleep, each VU would hammer the API as fast as possible — useful for finding the absolute breaking point, but not representative of real user behavior. A 1-second pause between requests models a user who just submitted a form and is waiting.
4. Thresholds are commitments. Defining p(95)<3500 in the script makes the performance budget explicit and machine-enforceable. When it breaks, you know exactly why — and you can't ship until it passes.
What's Next
The current setup is a solid baseline. Natural next steps would be:
- Export metrics to InfluxDB + Grafana for historical trend tracking across deploys
- Add a spike test stage — a sudden jump to 20 VUs for 10 seconds — to test recovery behavior
- Test the async endpoint separately — async PDF generation has different characteristics (202 immediate response, webhook delivery latency)
-
Parameterize the template ID and payload to test multiple templates in a single run using k6's
SharedArrayfor test data
But for a $4/month box generating PDFs for real customers, knowing it handles 5 concurrent requests within SLA — proved by automated tests triggered from GitHub — is exactly the confidence level needed to sleep well at night.
Built with k6 by Grafana Labs, deployed on Hetzner Cloud, automated with GitHub Actions.
Top comments (0)