InstaDevOps

Posted on Apr 26 • Originally published at instadevops.com

Load Testing with k6: From First Script to CI Pipeline Integration

#testing #k6 #performance #devops

Introduction

Most teams do not load test their applications until something breaks in production. A marketing campaign drives 10x traffic, the API starts returning 503s, and suddenly everyone is scrambling to figure out how many requests the system can actually handle.

k6 by Grafana Labs has become the go-to load testing tool for DevOps teams, and for good reason. It uses JavaScript for test scripts, runs efficiently on minimal hardware, has built-in support for protocols beyond HTTP (gRPC, WebSocket, browser), and integrates cleanly into CI/CD pipelines.

This guide takes you from writing your first k6 script to running automated performance tests in your CI pipeline with threshold-based pass/fail gates.

Writing Your First k6 Script

k6 scripts are JavaScript (ES6 modules) that define virtual user behavior. Each virtual user (VU) runs the default function repeatedly for the duration of the test.

// load-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';

// Test configuration
export const options = {
  vus: 50,           // 50 virtual users
  duration: '2m',    // Run for 2 minutes
};

// Default function - each VU runs this in a loop
export default function () {
  // GET request
  const listRes = http.get('https://api.example.com/products');

  check(listRes, {
    'list status is 200': (r) => r.status === 200,
    'list response time < 500ms': (r) => r.timings.duration < 500,
    'list returns products': (r) => JSON.parse(r.body).length > 0,
  });

  // POST request with JSON body
  const payload = JSON.stringify({
    name: 'Test Product',
    price: 29.99,
    category: 'electronics',
  });

  const createRes = http.post('https://api.example.com/products', payload, {
    headers: {
      'Content-Type': 'application/json',
      'Authorization': 'Bearer ' + __ENV.API_TOKEN,
    },
  });

  check(createRes, {
    'create status is 201': (r) => r.status === 201,
    'create response time < 1000ms': (r) => r.timings.duration < 1000,
  });

  sleep(1); // Think time between iterations
}

Run it:

# Basic run
k6 run load-test.js

# Pass environment variables
k6 run -e API_TOKEN=your-token load-test.js

# Override options from CLI
k6 run --vus 100 --duration 5m load-test.js

The sleep(1) at the end simulates real user think time. Without it, each VU fires requests as fast as possible, which does not represent realistic traffic patterns and will skew your results.

Ramping Strategies for Realistic Load Profiles

Constant VU count is fine for quick smoke tests, but production load testing requires more sophisticated patterns. k6 supports multiple execution scenarios with different ramping strategies.

Ramp-up/ramp-down (most common for load tests):

export const options = {
  stages: [
    { duration: '2m', target: 50 },   // Ramp up to 50 VUs over 2 minutes
    { duration: '5m', target: 50 },   // Stay at 50 VUs for 5 minutes
    { duration: '2m', target: 100 },  // Ramp up to 100 VUs
    { duration: '5m', target: 100 },  // Stay at 100 VUs for 5 minutes
    { duration: '2m', target: 0 },    // Ramp down to 0
  ],
};

Spike test (sudden traffic surge):

export const options = {
  stages: [
    { duration: '1m', target: 10 },    // Normal load
    { duration: '30s', target: 500 },   // Spike to 500 VUs
    { duration: '2m', target: 500 },    // Sustain spike
    { duration: '30s', target: 10 },    // Return to normal
    { duration: '2m', target: 10 },     // Recovery period
  ],
};

Soak test (find memory leaks and degradation over time):

export const options = {
  stages: [
    { duration: '5m', target: 50 },    // Ramp up
    { duration: '4h', target: 50 },    // Sustained load for 4 hours
    { duration: '5m', target: 0 },     // Ramp down
  ],
};

Multiple scenarios (simulate different user types simultaneously):

export const options = {
  scenarios: {
    browse: {
      executor: 'ramping-vus',
      startVUs: 0,
      stages: [
        { duration: '2m', target: 100 },
        { duration: '5m', target: 100 },
        { duration: '2m', target: 0 },
      ],
      exec: 'browseProducts',
    },
    purchase: {
      executor: 'constant-arrival-rate',
      rate: 10,             // 10 iterations per timeUnit
      timeUnit: '1s',       // = 10 requests per second
      duration: '9m',
      preAllocatedVUs: 50,
      maxVUs: 200,
      exec: 'purchaseFlow',
    },
  },
};

export function browseProducts() {
  http.get('https://api.example.com/products');
  sleep(Math.random() * 3 + 1); // 1-4 second think time
}

export function purchaseFlow() {
  const products = http.get('https://api.example.com/products');
  const productId = JSON.parse(products.body)[0].id;

  http.post('https://api.example.com/cart', JSON.stringify({ productId }), {
    headers: { 'Content-Type': 'application/json' },
  });

  http.post('https://api.example.com/checkout', null, {
    headers: { 'Content-Type': 'application/json' },
  });
}

The constant-arrival-rate executor is important. Instead of controlling the number of VUs, it controls the request rate. This gives you a consistent load regardless of how fast or slow the server responds, which is better for finding the breaking point.

Threshold-Based SLOs

Thresholds are k6's most powerful feature for CI integration. They define pass/fail criteria for your test. If any threshold is violated, k6 exits with a non-zero code, which fails your CI pipeline.

export const options = {
  stages: [
    { duration: '2m', target: 50 },
    { duration: '5m', target: 50 },
    { duration: '2m', target: 0 },
  ],
  thresholds: {
    // 95th percentile response time must be under 500ms
    'http_req_duration': ['p(95)<500', 'p(99)<1000'],

    // Error rate must be under 1%
    'http_req_failed': ['rate<0.01'],

    // Custom metrics per endpoint
    'http_req_duration{name:list_products}': ['p(95)<300'],
    'http_req_duration{name:create_order}': ['p(95)<800'],

    // Check pass rate must be above 98%
    'checks': ['rate>0.98'],

    // Custom counter threshold
    'payment_errors': ['count<5'],
  },
};

Use tags to set different thresholds for different endpoints:

import { Counter } from 'k6/metrics';

const paymentErrors = new Counter('payment_errors');

export default function () {
  // Tag requests for per-endpoint thresholds
  const listRes = http.get('https://api.example.com/products', {
    tags: { name: 'list_products' },
  });

  const orderRes = http.post('https://api.example.com/orders', payload, {
    tags: { name: 'create_order' },
  });

  if (orderRes.status !== 201) {
    paymentErrors.add(1);
  }
}

Map your thresholds directly to your SLOs. If your SLO states that 99.9% of API requests must complete within 500ms, your k6 threshold should reflect that. This turns load testing from a manual exercise into an automated contract verification.

Integrating k6 into CI/CD Pipelines

The real value of k6 emerges when you run it on every deployment. Here is a complete GitHub Actions workflow:

# .github/workflows/load-test.yml
name: Load Test

on:
  deployment_status:
    # Run after successful deployment to staging
  workflow_dispatch:
    inputs:
      target_url:
        description: 'Target URL for load test'
        required: true

jobs:
  load-test:
    runs-on: ubuntu-latest
    if: github.event.deployment_status.state == 'success' || github.event_name == 'workflow_dispatch'

    steps:
      - uses: actions/checkout@v4

      - name: Install k6
        run: |
          sudo gpg -k
          sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
          echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list
          sudo apt-get update
          sudo apt-get install k6

      - name: Run load test
        env:
          API_TOKEN: ${{ secrets.STAGING_API_TOKEN }}
          TARGET_URL: ${{ github.event.inputs.target_url || 'https://staging-api.example.com' }}
        run: |
          k6 run \
            --out json=results.json \
            --env TARGET_URL=$TARGET_URL \
            --env API_TOKEN=$API_TOKEN \
            tests/load/api-load-test.js

      - name: Upload results
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: k6-results
          path: results.json

      - name: Post results to PR
        if: failure() && github.event.pull_request
        uses: actions/github-script@v7
        with:
          script: |
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: '## Load Test Failed\n\nPerformance thresholds were violated. Check the [workflow run](' + context.serverUrl + '/' + context.repo.owner + '/' + context.repo.repo + '/actions/runs/' + context.runId + ') for details.'
            });

For GitLab CI:

load_test:
  stage: test
  image: grafana/k6:latest
  script:
    - k6 run --out json=results.json tests/load/api-load-test.js
  artifacts:
    when: always
    paths:
      - results.json
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"

Interpreting Results and Finding Bottlenecks

k6 outputs a summary at the end of each run. Here is how to read it:

http_req_duration..............: avg=245.3ms  min=12ms  med=198ms  max=4521ms  p(90)=412ms  p(95)=523ms
http_req_failed................: 0.34%   345 out of 100234
http_req_waiting...............: avg=240.1ms  min=10ms  med=193ms  max=4518ms  p(90)=407ms  p(95)=518ms
http_reqs......................: 100234  557.96/s
iteration_duration.............: avg=1.25s   min=1.01s  med=1.2s   max=5.52s   p(90)=1.41s  p(95)=1.53s
vus............................: 50      min=1       max=50

Key metrics to focus on:

p(95) and p(99) duration: These matter more than average. A 200ms average with a 5-second p99 means 1% of your users are having a terrible experience.
http_req_failed rate: Any non-zero failure rate under load deserves investigation.
http_req_waiting vs http_req_duration: The difference is the time spent receiving the response body. A large gap suggests large response payloads.
iteration_duration: If this is much higher than your request durations, check your think times and test logic.

When performance degrades under load, correlate k6 results with server-side metrics:

CPU saturation: If CPU hits 100%, you need horizontal scaling or code optimization.
Memory pressure: Climbing memory with eventual OOM kills points to memory leaks.
Database connections: Connection pool exhaustion causes sudden latency spikes.
Network I/O: Check for bandwidth limits between your services and dependencies.

Export results to Grafana for visualization:

# Stream results to Prometheus via remote write
k6 run --out experimental-prometheus-rw \
  --env K6_PROMETHEUS_RW_SERVER_URL=http://prometheus:9090/api/v1/write \
  load-test.js

# Or output to InfluxDB
k6 run --out influxdb=http://influxdb:8086/k6 load-test.js

Advanced Patterns and Tips

Parameterize with CSV data for realistic test data:

import papaparse from 'https://jslib.k6.io/papaparse/5.1.1/index.js';
import { SharedArray } from 'k6/data';

const users = new SharedArray('users', function () {
  return papaparse.parse(open('./test-users.csv'), { header: true }).data;
});

export default function () {
  const user = users[Math.floor(Math.random() * users.length)];

  const loginRes = http.post('https://api.example.com/login', JSON.stringify({
    email: user.email,
    password: user.password,
  }), {
    headers: { 'Content-Type': 'application/json' },
  });

  const token = JSON.parse(loginRes.body).token;

  http.get('https://api.example.com/dashboard', {
    headers: { 'Authorization': 'Bearer ' + token },
  });
}

Group related requests for logical organization:

import { group } from 'k6';

export default function () {
  group('User Login Flow', function () {
    http.get('https://app.example.com/login');
    http.post('https://api.example.com/auth/login', credentials);
  });

  group('Dashboard Load', function () {
    http.get('https://api.example.com/dashboard/summary');
    http.get('https://api.example.com/dashboard/notifications');
    http.get('https://api.example.com/dashboard/recent-activity');
  });
}

Test WebSocket connections:

import ws from 'k6/ws';

export default function () {
  const url = 'wss://api.example.com/ws';
  const res = ws.connect(url, {}, function (socket) {
    socket.on('open', () => {
      socket.send(JSON.stringify({ type: 'subscribe', channel: 'updates' }));
    });

    socket.on('message', (data) => {
      const msg = JSON.parse(data);
      check(msg, { 'received update': (m) => m.type === 'update' });
    });

    socket.setTimeout(function () {
      socket.close();
    }, 30000);
  });
}

Need Help with Your DevOps?

Performance testing is most valuable when it is part of your development workflow, not an afterthought before a launch. At InstaDevOps, we help teams build performance testing pipelines that catch regressions before they reach production, including k6 setup, realistic test scenarios, and CI integration.

We offer fractional DevOps engineering starting at $2,999/month with no long-term contracts. Book a free 15-minute call to discuss your performance testing needs: https://calendly.com/instadevops/15min