Part 2 of 4: Building a Real k6 Test Suite Against a Live Kubernetes App
In part 1 I covered k6's philosophy and the anatomy of a first test. This post is where things get real, a production-grade test suite running against a live microservices app on a homelab Kubernetes cluster, including what went wrong on the first run and how I debugged it. All of the code can be found here https://github.com/mwimpelberg28/k6-playground
The target: Online Boutique
Rather than testing against a mock or a toy API, I wanted something that resembles a real production system. Google's Online Boutique is a microservices demo app with 11 services covering a realistic e-commerce stack: frontend, cart, checkout, product catalog, currency conversion, recommendations, and more.
Deploying it took about two minutes:
kubectl create namespace boutique
kubectl apply -n boutique -f \
https://raw.githubusercontent.com/GoogleCloudPlatform/microservices-demo/main/release/kubernetes-manifests.yaml
My homelab runs a kubeadm cluster on Ubuntu with MetalLB for load balancing. Within 30 seconds MetalLB had assigned a real external IP and the app was serving traffic at http://10.4.20.2.
kubectl get svc -n boutique frontend-external
# NAME TYPE EXTERNAL-IP PORT(S)
# frontend-external LoadBalancer 10.4.20.2 80:xxxxx/TCP
The architecture decision that matters most
Before writing a single test I built a shared library. This is the difference between a test suite and a folder of scripts.
k6-boutique/
├── lib/
│ ├── client.js ← all HTTP calls
│ └── checks.js ← all assertions
└── tests/
├── smoke/
├── load/
├── stress/
└── browser/
lib/client.js knows how to talk to the app — base URL, request helpers, product IDs, checkout payload. Every test imports from it. Change the target URL once, every test picks it up:
export const BASE_URL = __ENV.BASE_URL || 'http://10.4.20.2';
export function addToCart(productId, quantity = 1, params = defaultParams) {
return http.post(
`${BASE_URL}/cart`,
{ product_id: productId, quantity: quantity.toString() },
params
);
}
lib/checks.js knows what a good response looks like for each page:
export function checkHome(res) {
return check(res, {
'status 200': (r) => r.status === 200,
'shows products': (r) => r.body.includes('Hot Products'),
'response < 2s': (r) => r.timings.duration < 2000,
});
}
Define it once, use it everywhere. When the app changes, you fix it in one place.
Smoke test first
The smoke test is 1 VU, 60 seconds, four steps: homepage → product page → add to cart → view cart. Its only job is to confirm the app is up and critical paths respond correctly. If smoke fails, nothing else runs.
export const options = {
vus: 1,
duration: '60s',
thresholds: {
http_req_failed: ['rate<0.05'],
http_req_duration: ['p(95)<2000'],
checks: ['rate>0.90'],
},
};
export function setup() {
getHome(); // warm the connection before VUs start
sleep(2);
}
export default function () {
group('homepage', () => { checkHome(getHome()); });
sleep(1);
group('product page', () => { checkProductPage(getProduct(randomProduct())); });
sleep(1);
group('add to cart', () => { checkCartAction(addToCart(randomProduct(), 1)); });
sleep(2);
}
What the first run caught
First smoke run: 10% error rate, two thresholds crossed. Response times were excellent p95 of 87ms so this wasn't a performance problem. Something was functionally wrong.
Debugging step 1 — verify the text the check was looking for:
curl -s http://10.4.20.2/product/0PUK6V6EV0 | grep -i "add to cart"
# <button type="submit" class="cymbal-button-primary">Add To Cart</button>
Text matched exactly. So the check wasn't wrong — some requests were returning non-200 responses before the check even ran.
Debugging step 2 — check what the cart POST actually returns:
curl -v -X POST http://10.4.20.2/cart \
-d "product_id=0PUK6V6EV0&quantity=1" \
-H "Content-Type: application/x-www-form-urlencoded"
< HTTP/1.1 302 Found
< Location: /cart
< Set-Cookie: shop_session-id=51779754-8ac6-4ac9-bbd9-1f062a8dc1b4
The cart POST returns a 302 and sets a session cookie. With only 30 seconds and 6 total iterations, cold-start noise before sessions were established was dominating the results. The fix: double the duration, add a setup() warmup function, and slightly relax thresholds — smoke should catch catastrophic failure, not enforce strict SLOs.
This is the value of testing against a real app rather than a mock you discover actual system behaviour.
Two bugs found during the load test
Running the full suite surfaced two more issues.
Bug 1 — Checkout success was 0%. All 79 checkout attempts completed and returned 200, but none matched the expected text. The confirmation page says Your order is complete! — not Your order is placed as the check assumed. One curl command revealed it:
curl -s [checkout flow with cookies] | grep -i "order\|confirm\|thank"
# Your order is complete!
Fix in lib/checks.js:
export function checkCheckout(res) {
return check(res, {
'order placed': (r) => r.status === 200 && r.body.includes('Your order is complete!'),
});
}
Bug 2 — Browser "page title present" failed all 41 iterations. In k6's browser API, page.title() returns a Promise and needs to be awaited:
// broken
'page title present': () => page.title().length > 0,
// fixed
'page title present': async () => (await page.title()).length > 0,
Both fixes are a good reminder that checks are only as good as the assumptions baked into them. The test framework did its job. It surfaced the mismatches immediately.
User journeys: three concurrent scenarios
With smoke passing it was time for the load test. Rather than hitting one endpoint in a loop, I modelled three distinct user types running simultaneously as k6 scenarios.
Browsers — casual visitors, read-only, up to 20 VUs:
export function browserJourney() {
group('homepage', () => { checkHome(getHome()); });
sleep(randSleep(2, 5));
const numProducts = Math.floor(Math.random() * 3) + 2;
for (let i = 0; i < numProducts; i++) {
group('browse product', () => { checkProductPage(getProduct(randomProduct())); });
sleep(randSleep(1, 4));
}
browseDepth.add(pagesViewed);
}
Shoppers — full checkout flow, up to 5 VUs:
export function shopperJourney() {
// homepage → product → add to cart → maybe add second item → checkout
group('checkout', () => {
const start = Date.now();
const res = checkout(checkoutPayload());
checkoutDuration.add(Date.now() - start);
checkoutSuccess.add(checkCheckout(res));
});
}
Currency switchers — exercises the currency microservice, constant arrival rate of 2 RPS:
currencyUsers: {
executor: 'constant-arrival-rate',
rate: 2,
timeUnit: '1s',
duration: '5m',
preAllocatedVUs: 5,
},
The constant-arrival-rate executor is worth understanding. Unlike ramping-vus which controls concurrency, arrival-rate controls throughput — 2 iterations per second regardless of how long each one takes. That's how production traffic actually behaves.
Custom metrics as business SLOs
The load test defines four custom metrics beyond what k6 tracks by default:
const checkoutDuration = new Trend('boutique_checkout_duration', true);
const checkoutSuccess = new Rate('boutique_checkout_success');
const cartErrors = new Counter('boutique_cart_errors');
const browseDepth = new Trend('boutique_browse_depth');
With thresholds encoding real business requirements:
'boutique_checkout_duration': ['p(95)<5000'], // 95% of checkouts under 5s
'boutique_checkout_success': ['rate>0.80'], // 80%+ must complete successfully
This is the shift from infrastructure SLOs to business SLOs — codified, version-controlled, enforced automatically in CI.
Results across all four test types
After fixing both bugs and re-running the full suite:
The results tell a clear story:
Response times are strong under normal load. Smoke p95 at 89ms and load p95 at 273ms show the app handles realistic traffic comfortably on homelab hardware.
Checkout: 0% → 100% after the fix. All 80 checkout attempts placed orders successfully, with a p95 of 224ms against a 5,000ms threshold. The bug was entirely in the check assertion, not the app.
Browser Web Vitals are healthy. LCP at 335ms and FCP at 255ms are well inside Core Web Vital targets. TTFB at 36ms is excellent. CLS at 0.117 just nudges over the 0.10 target — worth monitoring but not alarming.
Product page buckled first under stress. At 150 VUs the homepage held — 9,907 successful checks, zero 500 errors. The product page accumulated 2,037 failures. This makes architectural sense: the product page fans out to the product catalog, recommendation, and currency services simultaneously. Under load those downstream calls start queuing. The homepage is a simpler call graph and degrades later.
Browse depth averaged 4.0 pages per session — the random product browsing in the browser journey is working as intended, generating realistic read patterns.
What's next
Post 3 covers the stress test in depth reading degradation signals, understanding the product page failure pattern architecturally, and the k6 Browser module for Web Vitals measurement. Plus all four custom metric types and how to use them as CI-enforceable SLOs in Grafana Cloud.
#k6 #Grafana #LoadTesting #Kubernetes #Observability #SRE #PerformanceTesting

Top comments (0)