DEV Community

Matthew Wimpelberg
Matthew Wimpelberg

Posted on

Part 2 of 4: Building a Real k6 Test Suite Against a Live Kubernetes App

Part 2 of 4: Building a Real k6 Test Suite Against a Live Kubernetes App

In part 1 I covered k6's philosophy and the anatomy of a first test. This post is where things get real, a production-grade test suite running against a live microservices app on a homelab Kubernetes cluster, including what went wrong on the first run and how I debugged it. All of the code can be found here https://github.com/mwimpelberg28/k6-playground

The target: Online Boutique

Rather than testing against a mock or a toy API, I wanted something that resembles a real production system. Google's Online Boutique is a microservices demo app with 11 services covering a realistic e-commerce stack: frontend, cart, checkout, product catalog, currency conversion, recommendations, and more.

Deploying it took about two minutes:

kubectl create namespace boutique
kubectl apply -n boutique -f \
  https://raw.githubusercontent.com/GoogleCloudPlatform/microservices-demo/main/release/kubernetes-manifests.yaml
Enter fullscreen mode Exit fullscreen mode

My homelab runs a kubeadm cluster on Ubuntu with MetalLB for load balancing. Within 30 seconds MetalLB had assigned a real external IP and the app was serving traffic at http://10.4.20.2.

kubectl get svc -n boutique frontend-external
# NAME                TYPE           EXTERNAL-IP   PORT(S)
# frontend-external   LoadBalancer   10.4.20.2     80:xxxxx/TCP
Enter fullscreen mode Exit fullscreen mode

The architecture decision that matters most

Before writing a single test I built a shared library. This is the difference between a test suite and a folder of scripts.

k6-boutique/
├── lib/
│   ├── client.js     ← all HTTP calls
│   └── checks.js     ← all assertions
└── tests/
    ├── smoke/
    ├── load/
    ├── stress/
    └── browser/
Enter fullscreen mode Exit fullscreen mode

lib/client.js knows how to talk to the app — base URL, request helpers, product IDs, checkout payload. Every test imports from it. Change the target URL once, every test picks it up:

export const BASE_URL = __ENV.BASE_URL || 'http://10.4.20.2';

export function addToCart(productId, quantity = 1, params = defaultParams) {
  return http.post(
    `${BASE_URL}/cart`,
    { product_id: productId, quantity: quantity.toString() },
    params
  );
}
Enter fullscreen mode Exit fullscreen mode

lib/checks.js knows what a good response looks like for each page:

export function checkHome(res) {
  return check(res, {
    'status 200':     (r) => r.status === 200,
    'shows products': (r) => r.body.includes('Hot Products'),
    'response < 2s':  (r) => r.timings.duration < 2000,
  });
}
Enter fullscreen mode Exit fullscreen mode

Define it once, use it everywhere. When the app changes, you fix it in one place.

Smoke test first

The smoke test is 1 VU, 60 seconds, four steps: homepage → product page → add to cart → view cart. Its only job is to confirm the app is up and critical paths respond correctly. If smoke fails, nothing else runs.

export const options = {
  vus: 1,
  duration: '60s',
  thresholds: {
    http_req_failed:   ['rate<0.05'],
    http_req_duration: ['p(95)<2000'],
    checks:            ['rate>0.90'],
  },
};

export function setup() {
  getHome();  // warm the connection before VUs start
  sleep(2);
}

export default function () {
  group('homepage', () => { checkHome(getHome()); });
  sleep(1);

  group('product page', () => { checkProductPage(getProduct(randomProduct())); });
  sleep(1);

  group('add to cart', () => { checkCartAction(addToCart(randomProduct(), 1)); });
  sleep(2);
}
Enter fullscreen mode Exit fullscreen mode

What the first run caught

First smoke run: 10% error rate, two thresholds crossed. Response times were excellent p95 of 87ms so this wasn't a performance problem. Something was functionally wrong.

Debugging step 1 — verify the text the check was looking for:

curl -s http://10.4.20.2/product/0PUK6V6EV0 | grep -i "add to cart"
# <button type="submit" class="cymbal-button-primary">Add To Cart</button>
Enter fullscreen mode Exit fullscreen mode

Text matched exactly. So the check wasn't wrong — some requests were returning non-200 responses before the check even ran.

Debugging step 2 — check what the cart POST actually returns:

curl -v -X POST http://10.4.20.2/cart \
  -d "product_id=0PUK6V6EV0&quantity=1" \
  -H "Content-Type: application/x-www-form-urlencoded"
Enter fullscreen mode Exit fullscreen mode
< HTTP/1.1 302 Found
< Location: /cart
< Set-Cookie: shop_session-id=51779754-8ac6-4ac9-bbd9-1f062a8dc1b4
Enter fullscreen mode Exit fullscreen mode

The cart POST returns a 302 and sets a session cookie. With only 30 seconds and 6 total iterations, cold-start noise before sessions were established was dominating the results. The fix: double the duration, add a setup() warmup function, and slightly relax thresholds — smoke should catch catastrophic failure, not enforce strict SLOs.

This is the value of testing against a real app rather than a mock you discover actual system behaviour.

Two bugs found during the load test

Running the full suite surfaced two more issues.

Bug 1 — Checkout success was 0%. All 79 checkout attempts completed and returned 200, but none matched the expected text. The confirmation page says Your order is complete! — not Your order is placed as the check assumed. One curl command revealed it:

curl -s [checkout flow with cookies] | grep -i "order\|confirm\|thank"
# Your order is complete!
Enter fullscreen mode Exit fullscreen mode

Fix in lib/checks.js:

export function checkCheckout(res) {
  return check(res, {
    'order placed': (r) => r.status === 200 && r.body.includes('Your order is complete!'),
  });
}
Enter fullscreen mode Exit fullscreen mode

Bug 2 — Browser "page title present" failed all 41 iterations. In k6's browser API, page.title() returns a Promise and needs to be awaited:

// broken
'page title present': () => page.title().length > 0,

// fixed
'page title present': async () => (await page.title()).length > 0,
Enter fullscreen mode Exit fullscreen mode

Both fixes are a good reminder that checks are only as good as the assumptions baked into them. The test framework did its job. It surfaced the mismatches immediately.

User journeys: three concurrent scenarios

With smoke passing it was time for the load test. Rather than hitting one endpoint in a loop, I modelled three distinct user types running simultaneously as k6 scenarios.

Browsers — casual visitors, read-only, up to 20 VUs:

export function browserJourney() {
  group('homepage', () => { checkHome(getHome()); });
  sleep(randSleep(2, 5));

  const numProducts = Math.floor(Math.random() * 3) + 2;
  for (let i = 0; i < numProducts; i++) {
    group('browse product', () => { checkProductPage(getProduct(randomProduct())); });
    sleep(randSleep(1, 4));
  }
  browseDepth.add(pagesViewed);
}
Enter fullscreen mode Exit fullscreen mode

Shoppers — full checkout flow, up to 5 VUs:

export function shopperJourney() {
  // homepage → product → add to cart → maybe add second item → checkout
  group('checkout', () => {
    const start = Date.now();
    const res = checkout(checkoutPayload());
    checkoutDuration.add(Date.now() - start);
    checkoutSuccess.add(checkCheckout(res));
  });
}
Enter fullscreen mode Exit fullscreen mode

Currency switchers — exercises the currency microservice, constant arrival rate of 2 RPS:

currencyUsers: {
  executor: 'constant-arrival-rate',
  rate: 2,
  timeUnit: '1s',
  duration: '5m',
  preAllocatedVUs: 5,
},
Enter fullscreen mode Exit fullscreen mode

The constant-arrival-rate executor is worth understanding. Unlike ramping-vus which controls concurrency, arrival-rate controls throughput — 2 iterations per second regardless of how long each one takes. That's how production traffic actually behaves.

Custom metrics as business SLOs

The load test defines four custom metrics beyond what k6 tracks by default:

const checkoutDuration = new Trend('boutique_checkout_duration', true);
const checkoutSuccess  = new Rate('boutique_checkout_success');
const cartErrors       = new Counter('boutique_cart_errors');
const browseDepth      = new Trend('boutique_browse_depth');
Enter fullscreen mode Exit fullscreen mode

With thresholds encoding real business requirements:

'boutique_checkout_duration': ['p(95)<5000'],  // 95% of checkouts under 5s
'boutique_checkout_success':  ['rate>0.80'],   // 80%+ must complete successfully
Enter fullscreen mode Exit fullscreen mode

This is the shift from infrastructure SLOs to business SLOs — codified, version-controlled, enforced automatically in CI.

Results across all four test types

After fixing both bugs and re-running the full suite:

results

The results tell a clear story:

Response times are strong under normal load. Smoke p95 at 89ms and load p95 at 273ms show the app handles realistic traffic comfortably on homelab hardware.

Checkout: 0% → 100% after the fix. All 80 checkout attempts placed orders successfully, with a p95 of 224ms against a 5,000ms threshold. The bug was entirely in the check assertion, not the app.

Browser Web Vitals are healthy. LCP at 335ms and FCP at 255ms are well inside Core Web Vital targets. TTFB at 36ms is excellent. CLS at 0.117 just nudges over the 0.10 target — worth monitoring but not alarming.

Product page buckled first under stress. At 150 VUs the homepage held — 9,907 successful checks, zero 500 errors. The product page accumulated 2,037 failures. This makes architectural sense: the product page fans out to the product catalog, recommendation, and currency services simultaneously. Under load those downstream calls start queuing. The homepage is a simpler call graph and degrades later.

Browse depth averaged 4.0 pages per session — the random product browsing in the browser journey is working as intended, generating realistic read patterns.

What's next

Post 3 covers the stress test in depth reading degradation signals, understanding the product page failure pattern architecturally, and the k6 Browser module for Web Vitals measurement. Plus all four custom metric types and how to use them as CI-enforceable SLOs in Grafana Cloud.

#k6 #Grafana #LoadTesting #Kubernetes #Observability #SRE #PerformanceTesting

Top comments (0)