Part of The Coercion Saga — making AI write quality code.
E2E tests prove it works end-to-end. The full flow succeeds. But users complain it's slow.
Performance regressions are invisible. No test fails. No error appears. Just gradually slower response times until someone notices. By then, you don't know which commit caused it. Was it the ORM change? The new middleware? That "harmless" refactor?
Measure on every merge request. Catch regressions before they ship.
k6: Load Testing
Single-request tests pass. But under load? Connection pool exhausted. Memory leak. Database locks up. That N+1 query that takes 50ms with 10 rows takes 5 seconds with 10,000.
k6 simulates multiple users hitting your API simultaneously.
// tests/performance/smoke-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
vus: 5, // 5 virtual users
duration: '30s', // 30 seconds
thresholds: {
http_req_duration: ['p(95)<500'], // 95th percentile under 500ms
http_req_failed: ['rate<0.01'], // Less than 1% errors
},
};
export default function () {
const res = http.get(`${__ENV.API_URL}/health`);
check(res, {
'status is 200': (r) => r.status === 200,
'response time OK': (r) => r.timings.duration < 200,
});
sleep(1);
}
p(95)<500 is the constraint that matters. Not average—95th percentile. Averages lie. One fast request hides ten slow ones. p95 tells the truth: "95% of your users get this experience."
✗ http_req_duration: p(95)<500
↳ 97% — ✗ 612ms (threshold exceeded)
Endpoint got slow. Developer investigates before merge. Problem found before production. Before the angry email from the client.
The Ramp Pattern: Finding the Breaking Point
Smoke tests check "does it work under light load." Ramp tests find "where does it break."
// tests/performance/stress-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
stages: [
{ duration: '1m', target: 10 }, // Warm up
{ duration: '2m', target: 50 }, // Push it
{ duration: '2m', target: 100 }, // Stress it
{ duration: '1m', target: 0 }, // Cool down
],
thresholds: {
http_req_duration: ['p(99)<2000'], // Even p99 under 2s
http_req_failed: ['rate<0.1'],
},
};
export default function () {
const res = http.get(`${__ENV.API_URL}/api/items`);
check(res, {
'status is 200': (r) => r.status === 200,
});
sleep(0.5);
}
The ramp reveals:
- Where response times spike — "50 users is fine, 75 users and latency triples"
- Connection pool limits — "At 80 concurrent, we start timing out on DB connections"
- Memory leaks — "Memory grows linearly with users, never releases"
Run this weekly, not on every MR. It's slow. But it finds the ceiling before production hits it.
Realistic Scenarios: Test the Actual User Journey
A health check endpoint under load means nothing. Test what users actually do.
// tests/performance/user-flow.js
import http from 'k6/http';
import { check, group, sleep } from 'k6';
import { SharedArray } from 'k6/data';
// Pre-generated test users (created in setup)
const users = new SharedArray('users', function () {
return JSON.parse(open('./test-users.json'));
});
export const options = {
stages: [
{ duration: '30s', target: 10 },
{ duration: '1m', target: 10 },
{ duration: '30s', target: 0 },
],
thresholds: {
'group_duration{group:::auth}': ['p(95)<1000'],
'group_duration{group:::browse}': ['p(95)<500'],
'group_duration{group:::create}': ['p(95)<2000'],
},
};
export default function () {
const user = users[Math.floor(Math.random() * users.length)];
group('auth', () => {
const login = http.post(`${__ENV.API_URL}/auth/login`,
JSON.stringify({ email: user.email, password: user.password }),
{ headers: { 'Content-Type': 'application/json' } }
);
check(login, { 'logged in': (r) => r.status === 200 });
if (login.status !== 200) return;
const token = JSON.parse(login.body).access_token;
const auth = { headers: { Authorization: `Bearer ${token}` } };
group('browse', () => {
const items = http.get(`${__ENV.API_URL}/api/items`, auth);
check(items, { 'items loaded': (r) => r.status === 200 });
const item = http.get(`${__ENV.API_URL}/api/items/1`, auth);
check(item, { 'item loaded': (r) => r.status === 200 });
});
group('create', () => {
const newItem = http.post(`${__ENV.API_URL}/api/items`,
JSON.stringify({ title: `Test ${Date.now()}`, description: 'Load test' }),
{ ...auth, headers: { ...auth.headers, 'Content-Type': 'application/json' } }
);
check(newItem, { 'item created': (r) => r.status === 201 });
});
});
sleep(1);
}
The group_duration thresholds are per-operation. Auth can be slower (token generation is expensive). Browsing should be fast (it's cached, right?). Creating can be slowest (writes are expensive).
Different thresholds for different operations. Because "the API is slow" is useless. "Item creation is slow" is actionable.
Lighthouse: Frontend Performance
Bundle size creeps up. Images aren't optimized. JavaScript blocks rendering. Lighthouse score drops from 90 to 60. Users bounce before the page loads.
// lighthouserc.js
module.exports = {
ci: {
collect: {
url: ['http://localhost:4173'],
numberOfRuns: 3, // Average of 3 runs, reduces variance
settings: {
preset: 'desktop', // or 'mobile' for mobile-first
},
},
assert: {
assertions: {
'categories:performance': ['warn', { minScore: 0.8 }],
'categories:accessibility': ['error', { minScore: 0.9 }], // A11y is not optional
'categories:best-practices': ['warn', { minScore: 0.9 }],
// Core Web Vitals - the metrics Google cares about
'first-contentful-paint': ['warn', { maxNumericValue: 2000 }],
'largest-contentful-paint': ['warn', { maxNumericValue: 2500 }],
'cumulative-layout-shift': ['warn', { maxNumericValue: 0.1 }],
'total-blocking-time': ['warn', { maxNumericValue: 300 }],
// Bundle size matters
'total-byte-weight': ['warn', { maxNumericValue: 500000 }], // 500KB max
},
},
upload: {
target: 'temporary-public-storage', // Free, public reports
},
},
};
categories:accessibility is error, not warn. Accessibility bugs are bugs. They block merges.
total-byte-weight at 500KB is aggressive. But every KB is a user on slow 3G waiting longer. If your marketing page is 2MB, users leave before they see it.
The Graph That Saves You
Lighthouse CI can track scores over time. But even without that, trends matter.
// In your CI, after Lighthouse runs
const fs = require('fs');
const report = JSON.parse(fs.readFileSync('.lighthouseci/lhr-*.json'));
const metrics = {
timestamp: new Date().toISOString(),
commit: process.env.CI_COMMIT_SHA,
performance: report.categories.performance.score,
lcp: report.audits['largest-contentful-paint'].numericValue,
cls: report.audits['cumulative-layout-shift'].numericValue,
bundleSize: report.audits['total-byte-weight'].numericValue,
};
// Append to a metrics file in your repo (or send to a dashboard)
console.log(JSON.stringify(metrics));
Plot these over time. "Performance dropped 10 points in March" becomes visible. "Which commit?" becomes answerable.
The Gate
Two jobs. Backend load test with k6. Frontend audit with Lighthouse.
perf:k6:
stage: performance
image: grafana/k6:latest
services:
- name: postgres:16
alias: db
variables:
DATABASE_URL: postgresql+asyncpg://postgres:postgres@db:5432/test
POSTGRES_HOST_AUTH_METHOD: trust
API_URL: http://localhost:8000
before_script:
- apk add --no-cache python3 py3-pip
- pip3 install uv --break-system-packages
- cd backend
- uv sync --frozen
- uv run alembic upgrade head
- uv run uvicorn app.main:app --host 0.0.0.0 --port 8000 &
- sleep 5
# Verify backend is up
- wget -q --spider http://localhost:8000/health || exit 1
script:
- k6 run --out json=results.json tests/performance/smoke-test.js
artifacts:
when: always
paths:
- results.json
expire_in: 1 week
allow_failure: false
rules:
- if: $CI_PIPELINE_SOURCE == "merge_request_event"
- if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
perf:lighthouse:
stage: performance
image: node:lts-slim
variables:
CHROME_PATH: /usr/bin/chromium
LHCI_BUILD_CONTEXT__COMMIT_MESSAGE: $CI_COMMIT_MESSAGE
before_script:
- apt-get update && apt-get install -y chromium --no-install-recommends
- cd frontend
- npm ci
- npm run build
- npm run preview -- --port 4173 &
- npx wait-on http://localhost:4173 --timeout 30000
script:
- npx @lhci/cli autorun
artifacts:
when: always
paths:
- frontend/.lighthouseci/
expire_in: 1 week
allow_failure: true # Warn, don't block
rules:
- if: $CI_PIPELINE_SOURCE == "merge_request_event"
- if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
--out json=results.json on k6 saves raw metrics. Parse them later. Trend them over time. Find the slow creep before it becomes a slow crisis.
allow_failure: true on Lighthouse means it warns but doesn't block. Start there. When you're confident in your thresholds, switch to false.
Copy, paste, adapt. It works.
The Point
Functional tests prove correctness. Performance tests prove speed. Both matter.
A correct but slow endpoint is a broken endpoint. Users don't wait. They leave. They tweet about it. They use your competitor.
Amazon found that every 100ms of latency cost them 1% of sales. Google found that an extra 0.5 seconds in search page load dropped traffic by 20%. Your app isn't Amazon. But your users are just as impatient.
k6 catches backend slowdowns before they ship. Lighthouse catches frontend bloat before it accumulates. Automated measurement finds what humans miss.
Track trends. Catch regressions. Ship fast code.
That's the deal.
Top comments (0)